[Intel-gfx] [CI 11/15] drm/i915/huc: track delayed HuC load with a fence

Ceraolo Spurio, Daniele daniele.ceraolospurio at intel.com
Sat Nov 5 00:49:54 UTC 2022



On 11/4/2022 5:38 PM, Ceraolo Spurio, Daniele wrote:
>
>
> On 11/4/2022 4:26 PM, Brian Norris wrote:
>> Hi,
>>
>> On Wed, Oct 19, 2022 at 10:54:34AM +0100, Tvrtko Ursulin wrote:
>>> Don't know if this is real or not yet, hit it while running 
>>> selftests a bit. Something to keep an eye on.
>>>
>>> [ 2928.370577] ODEBUG: init destroyed (active state 0) object type: 
>>> i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915]
>>> [ 2928.370903] WARNING: CPU: 2 PID: 1113 at lib/debugobjects.c:502 
>>> debug_print_object+0x6b/0x90
>>> [ 2928.370984] Modules linked in: i915(+) drm_display_helper 
>>> drm_kms_helper netconsole cmac algif_hash algif_skcipher af_alg bnep 
>>> nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek 
>>> snd_hda_codec_generic ledtrig_audio snd_intel_dspcfg snd_hda_codec 
>>> snd_hwdep snd_hda_core snd_pcm intel_tcc_cooling 
>>> x86_pkg_temp_thermal intel_powerclamp snd_seq_midi 
>>> snd_seq_midi_event coretemp snd_rawmidi btusb btrtl btbcm kvm_intel 
>>> btmtk btintel ath10k_pci snd_seq kvm ath10k_core bluetooth snd_timer 
>>> rapl intel_cstate snd_seq_device input_leds mac80211 ecdh_generic 
>>> libarc4 ath snd ecc serio_raw intel_wmi_thunderbolt at24 soundcore 
>>> cfg80211 mei_me intel_xhci_usb_role_switch mei ideapad_laptop 
>>> intel_pch_thermal platform_profile sparse_keymap acpi_pad 
>>> sch_fq_codel msr efi_pstore ip_tables x_tables autofs4 
>>> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 
>>> aesni_intel prime_numbers crypto_simd atkbd drm_buddy cryptd 
>>> vivaldi_fmap r8169 ttm i2c_i801 i2c_smbus cec realtek xhci_pci 
>>> syscopyarea ahci
>>> [ 2928.371145]  xhci_pci_renesas sysfillrect sysimgblt libahci 
>>> fb_sys_fops video wmi [last unloaded: drm_kms_helper]
>>> [ 2928.371489] CPU: 2 PID: 1113 Comm: modprobe Tainted: G U  
>>> W          6.1.0-rc1 #196
>>> [ 2928.371550] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS 
>>> DCCN34WW(V2.03) 12/01/2015
>>> [ 2928.371615] RIP: 0010:debug_print_object+0x6b/0x90
>>> [ 2928.371664] Code: 49 89 c1 8b 43 10 83 c2 01 48 c7 c7 e8 be d6 bb 
>>> 8b 4b 14 89 15 ca be b4 02 4c 8b 45 00 48 8b 14 c5 40 56 a8 bb e8 ec 
>>> 5b 60 00 <0f> 0b 83 05 28 5a 3e 01 01 48 83 c4 08 5b 5d c3 83 05 1a 
>>> 5a 3e 01
>>> [ 2928.371782] RSP: 0018:ffff9ed841607a18 EFLAGS: 00010286
>>> [ 2928.371841] RAX: 0000000000000000 RBX: ffff9208116a1d48 RCX: 
>>> 0000000000000000
>>> [ 2928.371909] RDX: 0000000000000001 RSI: ffffffffbbd277d2 RDI: 
>>> 00000000ffffffff
>>> [ 2928.372024] RBP: ffffffffc176a540 R08: 0000000000000000 R09: 
>>> ffffffffbc07a1e0
>>> [ 2928.372128] R10: 0000000000000001 R11: 0000000000000001 R12: 
>>> ffff9208122da830
>>> [ 2928.372192] R13: ffff92080089b000 R14: ffff9208122da770 R15: 
>>> 0000000000000000
>>> [ 2928.372259] FS:  00007f53e7617c40(0000) GS:ffff92086e500000(0000) 
>>> knlGS:0000000000000000
>>> [ 2928.372365] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 2928.372425] CR2: 000055cd28b33070 CR3: 0000000110dbd006 CR4: 
>>> 00000000003706e0
>>> [ 2928.372526] Call Trace:
>>> [ 2928.372568]  <TASK>
>>> [ 2928.372614]  ? intel_guc_hang_check+0xb0/0xb0 [i915]
>>> [ 2928.373001]  __i915_sw_fence_init+0x2b/0x50 [i915]
>>> [ 2928.373374]  intel_huc_init_early+0x75/0xb0 [i915]
>>> [ 2928.373868]  intel_uc_init_early+0x4e/0x210 [i915]
>>> [ 2928.374241]  intel_gt_common_init_early+0x16f/0x180 [i915]
>>> [ 2928.374718]  intel_root_gt_init_early+0x49/0x60 [i915]
>>> [ 2928.375074]  i915_driver_probe+0x917/0xed0 [i915]
>> ...
>>
>> Did you track this down? Or consider reverting? This is tripping me up
>
> No. I didn't manage to repro locally after Tvrtko reported it (I run 
> the full selftest suite twice on both ADL-S and DG2 with the debug 
> config enabled), so I was keeping an eye out as suggested to see if it 
> popped out again. If you can repro this consistently, can you share 
> your setup info? What platform you're running on, if you're using the 
> latest drm-tip, any non-default params you're using, etc. Dmesg would 
> also be useful to see if there are other errors before this one.
>

Just to further clarify, this issue is also not showing up in our CI 
runs (which do have both the DEBUG_OBJECTS kconfigs you pointed out 
enabled), hence why I'm suspecting that this is only happening on 
specific setups, potentially due to a different kconfig or modparam 
being involved.

Daniele

> Thanks,
> Daniele
>
>> on drm-tip now when running selftests with CONFIG_DEBUG_OBJECTS=y /
>> CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS=y. It means I can't actually run
>> any subsequent tests, because of the kernel taint.
>>
>> Brian
>



More information about the Intel-gfx mailing list