Lockdep bug during hdcp_create_workqueue() (Was: BUG: key ffff8b521bda9148 has not been registered!)
David Ward
david.ward at gatech.edu
Tue May 4 11:58:24 UTC 2021
On 1/9/21 7:42 AM, Mikhail Gavrilov wrote:
> Hi folks!
> I started to see this message every boot after replacing Radeon VII to 6900XT.
>
> <...>
>
> [ 6.333672] [drm] REG_WAIT timeout 1us * 100000 tries -
> mpc2_assert_idle_mpcc line:480
> [ 6.335258] BUG: key ffff8b521bda9148 has not been registered!
> [ 6.335271] ------------[ cut here ]------------
> [ 6.335273] DEBUG_LOCKS_WARN_ON(1)
> [ 6.335279] WARNING: CPU: 18 PID: 525 at
> kernel/locking/lockdep.c:4618 lockdep_init_map_waits+0x18b/0x210
> [ 6.335284] Modules linked in: fjes(-) amdgpu(+) iommu_v2 gpu_sched
> ttm drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel cec drm
> ghash_clmulni_intel ccp igb nvme nvme_core dca i2c_algo_bit wmi
> pinctrl_amd fuse
> [ 6.335298] CPU: 18 PID: 525 Comm: systemd-udevd Not tainted
> 5.10.0-0.rc6.20201204git34816d20f173.92.fc34.x86_64 #1
> [ 6.335302] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 2802 10/21/2020
> [ 6.335306] RIP: 0010:lockdep_init_map_waits+0x18b/0x210
> [ 6.335309] Code: 00 85 c0 0f 84 75 ff ff ff 8b 3d 18 c4 f1 01 85
> ff 0f 85 67 ff ff ff 48 c7 c6 68 43 60 97 48 c7 c7 1d 90 5a 97 e8 70
> 1f b6 00 <0f> 0b e9 4d ff ff ff e8 19 59 bc 00 85 c0 74 21 44 8b 1d e6
> c3 f1
> [ 6.335315] RSP: 0018:ffff9e5a013d3910 EFLAGS: 00010282
> [ 6.335317] RAX: 0000000000000016 RBX: ffffffff97247d80 RCX: ffff8b5908fdb238
> [ 6.335320] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff8b5908fdb230
> [ 6.335322] RBP: ffff8b520e2a7978 R08: 0000000000000000 R09: 0000000000000000
> [ 6.335325] R10: ffff9e5a013d3740 R11: ffff8b592e2fffe8 R12: ffff8b521bda9148
> [ 6.335327] R13: 0000000000000000 R14: ffff8b521bc30330 R15: ffff8b521bc30330
> [ 6.335330] FS: 00007fe019eb9140(0000) GS:ffff8b5908e00000(0000)
> knlGS:0000000000000000
> [ 6.335333] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 6.335336] CR2: 00007fe018f5e000 CR3: 00000001142ee000 CR4: 0000000000350ee0
> [ 6.335338] Call Trace:
> [ 6.335342] __kernfs_create_file+0x7b/0x100
> [ 6.335344] sysfs_add_file_mode_ns+0xa3/0x190
> [ 6.335347] sysfs_create_bin_file+0x50/0x70
> [ 6.335428] hdcp_create_workqueue+0x3bd/0x410 [amdgpu]
> [ 6.335499] amdgpu_dm_init.isra.0.cold+0x136/0x126d [amdgpu]
> [ 6.335570] ? psp_set_srm+0xb0/0xb0 [amdgpu]
> [ 6.335637] ? hdcp_update_display+0x1f0/0x1f0 [amdgpu]
> [ 6.335641] ? dev_printk_emit+0x3e/0x40
> [ 6.335709] dm_hw_init+0xe/0x20 [amdgpu]
> [ 6.335776] amdgpu_device_init.cold+0x18c3/0x1bbc [amdgpu]
> [ 6.335781] ? pci_bus_read_config_word+0x39/0x50
> [ 6.335831] amdgpu_driver_load_kms+0x2b/0x1f0 [amdgpu]
> [ 6.335879] amdgpu_pci_probe+0x129/0x1b0 [amdgpu]
> [ 6.335889] local_pci_probe+0x42/0x80
> [ 6.335891] pci_device_probe+0xd9/0x1a0
> [ 6.335896] really_probe+0x205/0x460
> [ 6.335898] driver_probe_device+0xe1/0x150
> [ 6.335901] device_driver_attach+0xa8/0xb0
> [ 6.335904] __driver_attach+0x8c/0x150
> [ 6.335907] ? device_driver_attach+0xb0/0xb0
> [ 6.335909] ? device_driver_attach+0xb0/0xb0
> [ 6.335911] bus_for_each_dev+0x67/0x90
> [ 6.335914] bus_add_driver+0x12e/0x1f0
> [ 6.335917] driver_register+0x8b/0xe0
> [ 6.335919] ? 0xffffffffc0e4c000
> [ 6.335922] do_one_initcall+0x67/0x320
> [ 6.335925] ? rcu_read_lock_sched_held+0x3f/0x80
> [ 6.335928] ? trace_kmalloc+0xb2/0xe0
> [ 6.335930] ? kmem_cache_alloc_trace+0x157/0x270
> [ 6.335934] do_init_module+0x5c/0x260
> [ 6.335936] __do_sys_init_module+0x13d/0x1a0
> [ 6.335940] do_syscall_64+0x33/0x40
> [ 6.335943] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 6.335945] RIP: 0033:0x7fe01aab2efe
> [ 6.335948] Code: 48 8b 0d 7d 1f 0c 00 f7 d8 64 89 01 48 83 c8 ff
> c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4a 1f 0c 00 f7 d8 64 89
> 01 48
> [ 6.335953] RSP: 002b:00007ffdf4879928 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000af
> [ 6.335957] RAX: ffffffffffffffda RBX: 00005636774ad820 RCX: 00007fe01aab2efe
> [ 6.335959] RDX: 00005636774856e0 RSI: 0000000000b4f95e RDI: 00007fe01840f010
> [ 6.335962] RBP: 00007fe01840f010 R08: 000056367748bd30 R09: 0000000000b4f970
> [ 6.335964] R10: 00005633142fc82b R11: 0000000000000246 R12: 00005636774856e0
> [ 6.335967] R13: 00005636774d22d0 R14: 0000000000000000 R15: 00005636774a1d80
> [ 6.335971] irq event stamp: 343839
> [ 6.335973] hardirqs last enabled at (343839):
> [<ffffffff96162861>] console_unlock+0x511/0x640
> [ 6.335977] hardirqs last disabled at (343838):
> [<ffffffff961627c8>] console_unlock+0x478/0x640
> [ 6.335981] softirqs last enabled at (343730):
> [<ffffffff96e01112>] asm_call_irq_on_stack+0x12/0x20
> [ 6.335984] softirqs last disabled at (343657):
> [<ffffffff96e01112>] asm_call_irq_on_stack+0x12/0x20
> [ 6.335987] ---[ end trace a4445e953bea9224 ]---
Another user I am helping is seeing this bug, with a very similar stack
trace, in v5.12 (vanilla build) on different hardware.
> $ /usr/src/kernels/`uname -r`/scripts/faddr2line
> /lib/debug/lib/modules/`uname -r`/vmlinux lockdep_init_map_waits+0x18b
> lockdep_init_map_waits+0x18b/0x210:
> lockdep_init_map_waits at kernel/locking/lockdep.c:4618 (discriminator 7)
>
> $ git blame -L 4613,4623 kernel/locking/lockdep.c
I assume the issue is not actually in the lockdep code itself, but more
likely in the amdgpu / amd display code that ultimately calls it.
Using scripts/decode_stacktrace.sh, the stack trace for v5.12 reads like
this:
[ 12.817369] Call Trace:
[ 12.819991] __kernfs_create_file (fs/kernfs/file.c:998)
[ 12.824581] sysfs_add_file_mode_ns (fs/sysfs/file.c:324)
[ 12.829334] ? init_timer_key (kernel/time/timer.c:816)
[ 12.833527] sysfs_create_bin_file (fs/sysfs/file.c:558)
[ 12.838115] hdcp_create_workqueue
(drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_hdcp.c:648)
amdgpu
[ 12.843964] amdgpu_dm_init.isra.0.cold
(drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1481) amdgpu
[ 12.850303] ? lock_acquire (kernel/locking/lockdep.c:437
kernel/locking/lockdep.c:5513 kernel/locking/lockdep.c:5476)
[ 12.854303] ? lock_is_held_type (kernel/locking/lockdep.c:5254
kernel/locking/lockdep.c:5550)
[ 12.858779] ? smum_send_msg_to_smc_with_parameter
(drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smumgr.c:169) amdgpu
[ 12.865954] ? find_held_lock (kernel/locking/lockdep.c:5004)
[ 12.870065] ? smum_send_msg_to_smc_with_parameter
(drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smumgr.c:169) amdgpu
[ 12.877258] ? psp_set_srm
(drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_hdcp.c:396)
amdgpu
[ 12.882124] ? hdcp_update_display
(drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_hdcp.c:431)
amdgpu
[ 12.887914] ? arch_jump_label_transform (arch/x86/kernel/jump_label.c:99)
[ 12.892945] ? sched_clock_cpu (kernel/sched/clock.c:371)
[ 12.897051] dm_hw_init
(drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:1712) amdgpu
[ 12.901526] amdgpu_device_init.cold
(drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:295) amdgpu
[ 12.907703] ? _raw_spin_unlock_irqrestore
(./arch/x86/include/asm/paravirt.h:658
./arch/x86/include/asm/irqflags.h:145
./include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191)
[ 12.913015] amdgpu_driver_load_kms
(drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:157 (discriminator 6)) amdgpu
[ 12.918773] amdgpu_pci_probe
(drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:1221) amdgpu
Line numbers above correspond to the v5.12 tag in Linus's tree (this is
a vanilla kernel).
> Who can help fix this?
>
> Full kernel logs is here: https://pastebin.com/d2Nq01SX
I created an issue for this bug before I found this e-mail:
https://gitlab.freedesktop.org/drm/amd/-/issues/1586
The full kernel logs for v5.12 are posted there.
Thank you,
David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20210504/27830d1d/attachment-0001.htm>
More information about the amd-gfx
mailing list