Possible circular locking dependency (&dev->clientlist_mutex){+.+.}-{4:4}, at: drm_client_dev_suspend+0x44/0x140 but (console_lock){+.+.}-{0:0}, at: radeon_suspend_kms+0x3e8/0x490 [radeon] (Talos II, kernel v6.13.1)

Erhard Furtner erhard_f at mailbox.org
Sun Feb 2 15:10:48 UTC 2025


Greetings!

My Talos II (ppc64) boots up fine on v6.13.1 but at reboot I always get this warning with a SLUB_DEBUG_ON=y and PROVE_LOCKING=y enabled kernel:

[...]
EXT4-fs (nvme0n1p2): unmounting filesystem 4913eef4-b406-4b09-ad17-549fbf0a775e.
systemd-shutdown[1]: Syncing filesystems and block devices.
systemd-shutdown[1]: Sending SIGTERM to remaining processes...
systemd-journald[931]: Received SIGTERM from PID 1 (systemd-shutdow).
systemd-shutdown[1]: Sending SIGKILL to remaining processes...
systemd-shutdown[1]: Unmounting file systems.
(sd-umount)[1711]: Unmounting '/run/credentials/systemd-vconsole-setup.service'.
(sd-umount)[1712]: Unmounting '/run/credentials/systemd-journald.service'.
(sd-remount)[1713]: Remounting '/' read-only with options 'compress=zstd:1,ssd,discard=async,space_cache=v2,subvolid=5,subvol=/'.
systemd-shutdown[1]: All filesystems unmounted.
systemd-shutdown[1]: Deactivating swaps.
systemd-shutdown[1]: All swaps deactivated.
systemd-shutdown[1]: Detaching loop devices.
systemd-shutdown[1]: All loop devices detached.
systemd-shutdown[1]: Stopping MD devices.
systemd-shutdown[1]: All MD devices stopped.
systemd-shutdown[1]: Detaching DM devices.
systemd-shutdown[1]: All DM devices detached.
systemd-shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached.
systemd-shutdown[1]: Syncing filesystems and block devices.
systemd-shutdown[1]: Rebooting.
radeon 0033:01:00.0: Refused to change power state from D0 to D3hot

======================================================
WARNING: possible circular locking dependency detected
6.13.1-P9 #6 Tainted: G                T 
------------------------------------------------------
systemd-shutdow/1 is trying to acquire lock:
c000200015768300 (&dev->clientlist_mutex){+.+.}-{4:4}, at: drm_client_dev_suspend+0x44/0x140

but task is already holding lock:
c0000000023fb260 (console_lock){+.+.}-{0:0}, at: radeon_suspend_kms+0x3e8/0x490 [radeon]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (console_lock){+.+.}-{0:0}:
       lock_acquire+0x128/0x3d0
       console_lock+0x74/0xe0
       fbcon_fb_registered+0x2d4/0x2f0
       do_register_framebuffer+0x1ac/0x2f0
       register_framebuffer+0x40/0x70
       __drm_fb_helper_initial_config_and_unlock+0x3c0/0x6e0
       drm_fbdev_client_hotplug+0xb8/0x140
       drm_client_register+0xa8/0x120
       drm_fbdev_client_setup+0x140/0x290
       drm_client_setup+0x20/0x70
       radeon_pci_probe+0x218/0x270 [radeon]
       local_pci_probe+0x60/0xf0
       work_for_cpu_fn+0x30/0x50
       process_one_work+0x29c/0x810
       worker_thread+0x1fc/0x410
       kthread+0x148/0x150
       start_kernel_thread+0x14/0x18

-> #1 (registration_lock){+.+.}-{4:4}:
       lock_acquire+0x128/0x3d0
       __mutex_lock+0xe0/0x1060
       register_framebuffer+0x34/0x70
       __drm_fb_helper_initial_config_and_unlock+0x3c0/0x6e0
       drm_fbdev_client_hotplug+0xb8/0x140
       drm_client_register+0xa8/0x120
       drm_fbdev_client_setup+0x140/0x290
       drm_client_setup+0x20/0x70
       radeon_pci_probe+0x218/0x270 [radeon]
       local_pci_probe+0x60/0xf0
       work_for_cpu_fn+0x30/0x50
       process_one_work+0x29c/0x810
       worker_thread+0x1fc/0x410
       kthread+0x148/0x150
       start_kernel_thread+0x14/0x18

-> #0 (&dev->clientlist_mutex){+.+.}-{4:4}:
       check_prev_add+0x174/0x1240
       __lock_acquire+0x17e0/0x2120
       lock_acquire+0x128/0x3d0
       __mutex_lock+0xe0/0x1060
       drm_client_dev_suspend+0x44/0x140
       radeon_suspend_kms+0x3f8/0x490 [radeon]
       radeon_pci_shutdown+0x40/0xa0 [radeon]
       pci_device_shutdown+0x5c/0xd0
       device_shutdown+0x1fc/0x300
       kernel_restart+0x5c/0xf0
       __do_sys_reboot+0x130/0x2e0
       system_call_exception+0x1b4/0x390
       system_call_vectored_common+0xf0/0x280

other info that might help us debug this:

Chain exists of:
  &dev->clientlist_mutex --> registration_lock --> console_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(console_lock);
                               lock(registration_lock);
                               lock(console_lock);
  lock(&dev->clientlist_mutex);

 *** DEADLOCK ***

4 locks held by systemd-shutdow/1:
 #0: c0000000023f5d20 (system_transition_mutex){+.+.}-{4:4}, at: __do_sys_reboot+0xf8/0x2e0
 #1: c0000000119bc1b0 (&dev->mutex){....}-{4:4}, at: device_shutdown+0x150/0x300
 #2: c0000000119b91b0 (&dev->mutex){....}-{4:4}, at: device_shutdown+0x164/0x300
 #3: c0000000023fb260 (console_lock){+.+.}-{0:0}, at: radeon_suspend_kms+0x3e8/0x490 [radeon]

stack backtrace:
CPU: 13 UID: 0 PID: 1 Comm: systemd-shutdow Tainted: G                T  6.13.1-P9 #6
Tainted: [T]=RANDSTRUCT
Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
Call Trace:
[c0000000084ff3e0] [c0000000010991a8] dump_stack_lvl+0xbc/0x110 (unreliable)
[c0000000084ff420] [c000000000202318] print_circular_bug+0x3c8/0x470
[c0000000084ff4d0] [c000000000202594] check_noncircular+0x1d4/0x1f0
[c0000000084ff590] [c000000000203c94] check_prev_add+0x174/0x1240
[c0000000084ff650] [c000000000208460] __lock_acquire+0x17e0/0x2120
[c0000000084ff790] [c000000000208ec8] lock_acquire+0x128/0x3d0
[c0000000084ff890] [c0000000010e12e0] __mutex_lock+0xe0/0x1060
[c0000000084ff9b0] [c000000000c1f4b4] drm_client_dev_suspend+0x44/0x140
[c0000000084ffa40] [c00800000de36610] radeon_suspend_kms+0x3f8/0x490 [radeon]
[c0000000084ffb00] [c00800000de33638] radeon_pci_shutdown+0x40/0xa0 [radeon]
[c0000000084ffb30] [c000000000b0952c] pci_device_shutdown+0x5c/0xd0
[c0000000084ffb70] [c000000000c5d99c] device_shutdown+0x1fc/0x300
[c0000000084ffc00] [c0000000001a2b5c] kernel_restart+0x5c/0xf0
[c0000000084ffc70] [c0000000001a2f70] __do_sys_reboot+0x130/0x2e0
[c0000000084ffdd0] [c00000000002ea64] system_call_exception+0x1b4/0x390
[c0000000084ffe50] [c00000000000c270] system_call_vectored_common+0xf0/0x280
--- interrupt: 3000 at 0x3fffbd6ec040
NIP:  00003fffbd6ec040 LR: 00003fffbd6ec040 CTR: 0000000000000000
REGS: c0000000084ffe80 TRAP: 3000   Tainted: G                T   (6.13.1-P9)
MSR:  900000000280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI>  CR: 48002448  XER: 00000000
IRQMASK: 0 
GPR00: 0000000000000058 00003ffffbed38e0 00003fffbd7f7100 fffffffffee1dead 
GPR04: 0000000028121969 0000000001234567 672e000000000000 0000000000000020 
GPR08: 00003ffffbed2ed5 0000000000000000 0000000000000000 0000000000000000 
GPR12: 0000000000000000 00003fffbdce57e0 00000001341eadc7 00000001341eade0 
GPR16: 00000001341eae47 0000000000000000 0000000000000000 0000000000000001 
GPR20: 0000000000000000 00003ffffbed39e8 00003ffffbed3f48 0000000000000000 
GPR24: 0000000000000000 0000000000000000 0000000001234567 00000001341ebba8 
GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
NIP [00003fffbd6ec040] 0x3fffbd6ec040
LR [00003fffbd6ec040] 0x3fffbd6ec040
--- interrupt: 3000


 # lspci 
0000:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0001:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0002:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0003:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0004:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0004:01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0004:01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0005:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0005:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
0005:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
0030:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0031:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0031:01:00.0 Non-Volatile memory controller: Phison Electronics Corporation E8 PCIe3 x2 NVMe Controller (rev 01)
0032:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0033:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0033:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV516 [Radeon X1300/X1550 Series]
0033:01:00.1 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] RV516
[Radeon X1300/X1550 Series] (Secondary)

Kernel .config attached, full dmesg can be provided if needed.

Regards,
Erhard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config_6131_p9
Type: application/octet-stream
Size: 130641 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20250202/31aacc09/attachment-0001.obj>


More information about the amd-gfx mailing list