[Intel-xe] [PATCH 2/2] drm/xe: Allocate regset space for Wa_1607983814 unconditionally

Lucas De Marchi lucas.demarchi at intel.com
Wed Mar 8 17:02:50 UTC 2023


On Wed, Mar 08, 2023 at 08:15:19AM -0800, Matt Roper wrote:
>On Tue, Mar 07, 2023 at 05:19:09PM -0800, Lucas De Marchi wrote:
>> On Tue, Mar 07, 2023 at 04:55:09PM -0800, Matt Roper wrote:
>> > As part of commit 6f2c0e92587b ("drm/xe/mocs: LNCF MOCS settings only
>> > need to be restored on pre-Xe_HP"), the regset allocation was slimmed
>> > down to not unnecessarily include extra LNCFMOCS space in the regset
>> > area for the platforms and engines that don't need to save/restore those
>> > registers.  Unfortunately this change is causing a driver load
>> > regression on Xe_LP platforms:
>>
>> s/Xe_LP/some Xe_LP/. I have been loading xe in a TGL without such issue.
>> Report was on DG1
>
>It happens on TGL as well, although far less often.  Loading/unloading
>in a loop will eventually trigger it for me on TGL, whereas it
>reproduced nearly every time on DG1.  I haven't actually tried on RKL or
>ADL yet, but given that both DG1 and TGL show the problem, I assume they
>probably will as well.

ok, let's keep the wording then. My r-b below still applies.

thanks
Lucas De Marchi

>
>
>Matt
>
>>
>>
>> >
>> >  ------------[ cut here ]------------
>> >  WARNING: CPU: 4 PID: 1677 at drivers/gpu/drm/xe/xe_hw_fence.c:91 xe_hw_fence_irq_finish+0x35/0x120 [xe]
>> >  Modules linked in: xe(+) drm_ttm_helper drm_suballoc_helper gpu_sched
>> >  drm_buddy video drm_display_helper drm_kms_helper syscopyarea
>> >  sysfillrect sysimgblt ttm fuse x86_pkg_temp_thermal coretemp
>> >  kvm_intel mei_pxp mei_hdcp kvm irqbypass wmi_bmof mei_me mei
>> >  crct10dif_pcl mul crc32_pclmul e1000e ghash_clmulni_intel ptp
>> >  i2c_i801 i2c_smbus pps_core intel_lpss_pci wmi [last unloaded: ttm]
>> >  CPU: 4 PID: 1677 Comm: modprobe Not tainted 6.1.0-CI_DRM_12746-g6ce36b596fa7+ #474
>> >  Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.D00.X212.B00.190824175
>> >
>> >  RIP: 0010:xe_hw_fence_irq_finish+0x35/0x120 [xe]
>> >  Code: 54 4c 8d 67 60 55 53 48 83 ec 10 48 8b 47 60 49 39 c4 75 13 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc
>> > f9 37 e1 48 89 ef 88 44 24 07 e8 78 ea 77 e1 48 8b 55
>> >  RSP: 0018:ffffc90001967bc8 EFLAGS: 00010206
>> >  RAX: ffff88811058f698 RBX: 00000000ffffffc2 RCX: ffff88810c8289a8
>> >  RDX: 0000000000000000 RSI: ffffffff822868c2 RDI: ffff888115524a98
>> >  RBP: ffff888115524a98 R08: 000000000000011f R09: 00000000ffffffff
>> >  R10: 0000000000000000 R11: 000000006243100e R12: ffff888115524af8
>> >  R13: ffff888115524b10 R14: ffff888115523c38 R15: ffffffffa04ecfbe
>> >  FS:  00007f6ca9483740(0000) GS:ffff88844dc00000(0000) knlGS:0000000000000000
>> >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >  CR2: 0000557fefd93020 CR3: 00000001157ea004 CR4: 00000000003706e0
>> >  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> >  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> >  Call Trace:
>> >   <TASK>
>> >   xe_gt_init+0x343/0x390 [xe]
>> >   xe_device_probe+0x233/0x2a0 [xe]
>> >   xe_pci_probe+0x351/0x480 [xe]
>> >
>> > The reason for this failure hasn't been identified yet (the smaller
>> > regset allocation is still more than large enough to hold all of the
>> > registers; it isn't an overflow problem) so let's revert the problematic
>> > optimization to restore proper behavior while we investigate further.
>> >
>> > Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>> > Reported-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>> > Signed-off-by: Matt Roper <matthew.d.roper at intel.com>
>>
>>
>> Reviewed-by: Lucas De Marchi <lucas.demarchi at intel.com>
>>
>> Lucas De Marchi
>>
>> > ---
>> > drivers/gpu/drm/xe/xe_guc_ads.c | 5 +----
>> > 1 file changed, 1 insertion(+), 4 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
>> > index fd9911ffeae4..304a9501b447 100644
>> > --- a/drivers/gpu/drm/xe/xe_guc_ads.c
>> > +++ b/drivers/gpu/drm/xe/xe_guc_ads.c
>> > @@ -224,10 +224,7 @@ static size_t calculate_regset_size(struct xe_gt *gt)
>> > 		xa_for_each(&hwe->reg_sr.xa, sr_idx, sr_entry)
>> > 			count++;
>> >
>> > -	count += ADS_REGSET_EXTRA_MAX * XE_NUM_HW_ENGINES;
>> > -
>> > -	if (needs_wa_1607983814(gt_to_xe(gt)))
>> > -		count += LNCFCMOCS_REG_COUNT;
>> > +	count += (ADS_REGSET_EXTRA_MAX + LNCFCMOCS_REG_COUNT) * XE_NUM_HW_ENGINES;
>> >
>> > 	return count * sizeof(struct guc_mmio_reg);
>> > }
>> > --
>> > 2.39.2
>> >
>
>-- 
>Matt Roper
>Graphics Software Engineer
>Linux GPU Platform Enablement
>Intel Corporation


More information about the Intel-xe mailing list