rcu_sched detected expedited stalls in amdgpu after suspend

Mon Jun 27 19:22:24 UTC 2022

Hi,

Since Linux 5.19-ish, I consistently get these types of errors when 
resuming from S3:

[15652.909157] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 11-... } 7 jiffies s: 9981 root: 0x800/.
[15652.909162] rcu: blocking rcu_node structures (internal RCU debug):
[15652.909163] Task dump for CPU 11:
[15652.909164] task:kworker/u24:65  state:R  running task     stack:    0 pid:210218 ppid:     2 flags:0x00004008
[15652.909167] Workqueue: events_unbound async_run_entry_fn
[15652.909172] Call Trace:
[15652.909173]  <TASK>
[15652.909174]  ? atom_get_src_int+0x38e/0x680
[15652.909179]  ? atom_op_test+0x67/0x190
[15652.909181]  ? amdgpu_atom_execute_table_locked+0x19a/0x300
[15652.909184]  ? atom_op_calltable+0xb1/0x110
[15652.909186]  ? amdgpu_atom_execute_table_locked+0x19a/0x300
[15652.909189]  ? atom_op_calltable+0xb1/0x110
[15652.909191]  ? amdgpu_atom_execute_table_locked+0x19a/0x300
[15652.909193]  ? __switch_to+0x137/0x440
[15652.909195]  ? amdgpu_atom_asic_init+0xe0/0x100
[15652.909198]  ? pci_bus_read_config_dword+0x36/0x50
[15652.909201]  ? amdgpu_device_resume+0x10b/0x3e0
[15652.909203]  ? amdgpu_pmops_resume+0x32/0x60
[15652.909204]  ? pci_pm_suspend+0x2b0/0x2b0
[15652.909206]  ? dpm_run_callback+0x35/0x1f0
[15652.909209]  ? device_resume+0x1ca/0x220
[15652.909211]  ? async_resume+0x19/0xe0
[15652.909213]  ? async_run_entry_fn+0x33/0x120
[15652.909215]  ? process_one_work+0x1d6/0x350
[15652.909218]  ? worker_thread+0x24d/0x480
[15652.909220]  ? kthread+0x137/0x150
[15652.909221]  ? worker_clr_flags+0x40/0x40
[15652.909224]  ? kthread_blkcg+0x30/0x30
[15652.909226]  ? ret_from_fork+0x22/0x30
[15652.909227]  </TASK>
[15653.015808] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 11-... } 7 jiffies s: 9985 root: 0x800/.
[15653.015812] rcu: blocking rcu_node structures (internal RCU debug):
[15653.015813] Task dump for CPU 11:
[15653.015813] task:kworker/u24:65  state:R  running task     stack:    0 pid:210218 ppid:     2 flags:0x00004008
[15653.015816] Workqueue: events_unbound async_run_entry_fn
[15653.015820] Call Trace:
[15653.015820]  <TASK>
[15653.015821]  ? amdgpu_cgs_read_register+0x10/0x10
[15653.015825]  ? smu7_copy_bytes_to_smc+0xd4/0x200
[15653.015828]  ? polaris10_program_memory_timing_parameters+0x195/0x1b0
[15653.015831]  ? sysvec_apic_timer_interrupt+0xa/0x80
[15653.015834]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[15653.015836]  ? amdgpu_cgs_destroy_device+0x10/0x10
[15653.015839]  ? sysvec_apic_timer_interrupt+0xa/0x80
[15653.015841]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[15653.015843]  ? amdgpu_cgs_destroy_device+0x10/0x10
[15653.015846]  ? amdgpu_device_rreg+0x8f/0xd0
[15653.015847]  ? phm_wait_for_register_unequal+0x99/0xd0
[15653.015850]  ? smu7_send_msg_to_smc+0x95/0x130
[15653.015853]  ? smum_send_msg_to_smc+0x5d/0xa0
[15653.015854]  ? amdgpu_cgs_read_ind_register+0xa0/0xa0
[15653.015857]  ? smu7_enable_dpm_tasks+0x241f/0x28c0
[15653.015859]  ? hwmgr_resume+0x31/0x70
[15653.015861]  ? amdgpu_device_resume+0x1fa/0x3e0
[15653.015863]  ? amdgpu_pmops_resume+0x32/0x60
[15653.015864]  ? pci_pm_suspend+0x2b0/0x2b0
[15653.015866]  ? dpm_run_callback+0x35/0x1f0
[15653.015868]  ? device_resume+0x1ca/0x220
[15653.015870]  ? async_resume+0x19/0xe0
[15653.015872]  ? async_run_entry_fn+0x33/0x120
[15653.015874]  ? process_one_work+0x1d6/0x350
[15653.015877]  ? worker_thread+0x24d/0x480
[15653.015878]  ? kthread+0x137/0x150
[15653.015880]  ? worker_clr_flags+0x40/0x40
[15653.015882]  ? kthread_blkcg+0x30/0x30
[15653.015884]  ? ret_from_fork+0x22/0x30
[15653.015886]  </TASK>

I have not noticed any resulting problems. I am reporting this in the 
hope that it is easy to fix the issue and remove the error messages 
which may obscure some future problem.

Thanks,
Alex.