[Intel-gfx] [CI 11/15] drm/i915/huc: track delayed HuC load with a fence
Ceraolo Spurio, Daniele
daniele.ceraolospurio at intel.com
Sat Nov 5 00:49:54 UTC 2022
On 11/4/2022 5:38 PM, Ceraolo Spurio, Daniele wrote:
>
>
> On 11/4/2022 4:26 PM, Brian Norris wrote:
>> Hi,
>>
>> On Wed, Oct 19, 2022 at 10:54:34AM +0100, Tvrtko Ursulin wrote:
>>> Don't know if this is real or not yet, hit it while running
>>> selftests a bit. Something to keep an eye on.
>>>
>>> [ 2928.370577] ODEBUG: init destroyed (active state 0) object type:
>>> i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915]
>>> [ 2928.370903] WARNING: CPU: 2 PID: 1113 at lib/debugobjects.c:502
>>> debug_print_object+0x6b/0x90
>>> [ 2928.370984] Modules linked in: i915(+) drm_display_helper
>>> drm_kms_helper netconsole cmac algif_hash algif_skcipher af_alg bnep
>>> nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
>>> snd_hda_codec_generic ledtrig_audio snd_intel_dspcfg snd_hda_codec
>>> snd_hwdep snd_hda_core snd_pcm intel_tcc_cooling
>>> x86_pkg_temp_thermal intel_powerclamp snd_seq_midi
>>> snd_seq_midi_event coretemp snd_rawmidi btusb btrtl btbcm kvm_intel
>>> btmtk btintel ath10k_pci snd_seq kvm ath10k_core bluetooth snd_timer
>>> rapl intel_cstate snd_seq_device input_leds mac80211 ecdh_generic
>>> libarc4 ath snd ecc serio_raw intel_wmi_thunderbolt at24 soundcore
>>> cfg80211 mei_me intel_xhci_usb_role_switch mei ideapad_laptop
>>> intel_pch_thermal platform_profile sparse_keymap acpi_pad
>>> sch_fq_codel msr efi_pstore ip_tables x_tables autofs4
>>> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3
>>> aesni_intel prime_numbers crypto_simd atkbd drm_buddy cryptd
>>> vivaldi_fmap r8169 ttm i2c_i801 i2c_smbus cec realtek xhci_pci
>>> syscopyarea ahci
>>> [ 2928.371145] xhci_pci_renesas sysfillrect sysimgblt libahci
>>> fb_sys_fops video wmi [last unloaded: drm_kms_helper]
>>> [ 2928.371489] CPU: 2 PID: 1113 Comm: modprobe Tainted: G U
>>> W 6.1.0-rc1 #196
>>> [ 2928.371550] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS
>>> DCCN34WW(V2.03) 12/01/2015
>>> [ 2928.371615] RIP: 0010:debug_print_object+0x6b/0x90
>>> [ 2928.371664] Code: 49 89 c1 8b 43 10 83 c2 01 48 c7 c7 e8 be d6 bb
>>> 8b 4b 14 89 15 ca be b4 02 4c 8b 45 00 48 8b 14 c5 40 56 a8 bb e8 ec
>>> 5b 60 00 <0f> 0b 83 05 28 5a 3e 01 01 48 83 c4 08 5b 5d c3 83 05 1a
>>> 5a 3e 01
>>> [ 2928.371782] RSP: 0018:ffff9ed841607a18 EFLAGS: 00010286
>>> [ 2928.371841] RAX: 0000000000000000 RBX: ffff9208116a1d48 RCX:
>>> 0000000000000000
>>> [ 2928.371909] RDX: 0000000000000001 RSI: ffffffffbbd277d2 RDI:
>>> 00000000ffffffff
>>> [ 2928.372024] RBP: ffffffffc176a540 R08: 0000000000000000 R09:
>>> ffffffffbc07a1e0
>>> [ 2928.372128] R10: 0000000000000001 R11: 0000000000000001 R12:
>>> ffff9208122da830
>>> [ 2928.372192] R13: ffff92080089b000 R14: ffff9208122da770 R15:
>>> 0000000000000000
>>> [ 2928.372259] FS: 00007f53e7617c40(0000) GS:ffff92086e500000(0000)
>>> knlGS:0000000000000000
>>> [ 2928.372365] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 2928.372425] CR2: 000055cd28b33070 CR3: 0000000110dbd006 CR4:
>>> 00000000003706e0
>>> [ 2928.372526] Call Trace:
>>> [ 2928.372568] <TASK>
>>> [ 2928.372614] ? intel_guc_hang_check+0xb0/0xb0 [i915]
>>> [ 2928.373001] __i915_sw_fence_init+0x2b/0x50 [i915]
>>> [ 2928.373374] intel_huc_init_early+0x75/0xb0 [i915]
>>> [ 2928.373868] intel_uc_init_early+0x4e/0x210 [i915]
>>> [ 2928.374241] intel_gt_common_init_early+0x16f/0x180 [i915]
>>> [ 2928.374718] intel_root_gt_init_early+0x49/0x60 [i915]
>>> [ 2928.375074] i915_driver_probe+0x917/0xed0 [i915]
>> ...
>>
>> Did you track this down? Or consider reverting? This is tripping me up
>
> No. I didn't manage to repro locally after Tvrtko reported it (I run
> the full selftest suite twice on both ADL-S and DG2 with the debug
> config enabled), so I was keeping an eye out as suggested to see if it
> popped out again. If you can repro this consistently, can you share
> your setup info? What platform you're running on, if you're using the
> latest drm-tip, any non-default params you're using, etc. Dmesg would
> also be useful to see if there are other errors before this one.
>
Just to further clarify, this issue is also not showing up in our CI
runs (which do have both the DEBUG_OBJECTS kconfigs you pointed out
enabled), hence why I'm suspecting that this is only happening on
specific setups, potentially due to a different kconfig or modparam
being involved.
Daniele
> Thanks,
> Daniele
>
>> on drm-tip now when running selftests with CONFIG_DEBUG_OBJECTS=y /
>> CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS=y. It means I can't actually run
>> any subsequent tests, because of the kernel taint.
>>
>> Brian
>
More information about the Intel-gfx
mailing list