kernel 5.15.x: AMD RX 6700 XT - Fails to resume after screen blank

Mark Boddington lkml at badpenguin.co.uk
Wed Nov 24 19:14:41 UTC 2021


Hi all,

TL;DR - git bisection points to 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.15.4&id=61d861cf478576d85d6032f864360a34b26084b1 
as causing an issue when changing power state after idle.

Since 5.15.0 I have had intermittent issues with my GPU failing to 
resume after entering power saving. I have errors like these:

Nov 18 09:52:19 katana kernel: [ 4921.669813] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:21 katana kernel: [ 4923.667803] snd_hda_intel 
0000:0d:00.1: refused to change power state from D0 to D3hot
Nov 18 09:52:26 katana kernel: [ 4928.622234] amdgpu 0000:0d:00.0: 
amdgpu: Failed to export SMU metrics table!
Nov 18 09:52:31 katana kernel: [ 4933.371814] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:31 katana kernel: [ 4933.650854] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:32 katana kernel: [ 4933.921708] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:32 katana kernel: [ 4933.940249] amdgpu 0000:0d:00.0: 
amdgpu: SMU: I'm not done with your previous command!
Nov 18 09:52:32 katana kernel: [ 4933.940254] amdgpu 0000:0d:00.0: 
amdgpu: Failed to export SMU metrics table!
Nov 18 09:52:32 katana kernel: [ 4934.192236] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:32 katana kernel: [ 4934.463213] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:33 katana kernel: [ 4934.736895] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:33 katana kernel: [ 4935.007928] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:33 katana kernel: [ 4935.279063] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:33 katana kernel: [ 4935.550243] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:34 katana kernel: [ 4935.824034] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:34 katana kernel: [ 4936.095158] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:34 katana kernel: [ 4936.366210] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:34 katana kernel: [ 4936.629193] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:35 katana kernel: [ 4936.886333] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:35 katana kernel: [ 4937.140815] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:35 katana kernel: [ 4937.395341] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:35 katana kernel: [ 4937.649885] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:36 katana kernel: [ 4937.906944] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:36 katana kernel: [ 4938.162866] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3

this eventually leads to processes crashing, and the system locking up 
during shutdown.

A git bisection has isolated the following patch as the cause.

commit 8f0284f190e6a0aa09015090568c03f18288231a (refs/bisect/bad)
Merge: 5bea1c8ce673 61d861cf4785
Author: Dave Airlie <airlied at redhat.com>
Date:   Mon Aug 30 09:06:01 2021 +1000

     Merge tag 'amd-drm-next-5.15-2021-08-27' of 
https://gitlab.freedesktop.org/agd5f/linux into drm-next

     amd-drm-next-5.15-2021-08-27:

     amdgpu:
     - PLL fix for SI
     - Misc code cleanups
     - RAS fixes
     - PSP cleanups
     - Polaris UVD/VCE suspend fixes
     - aldebaran fixes
     - DCN3.x mclk fixes

     amdkfd:
     - CWSR fixes for arcturus and aldebaran
     - SVM fixes

     Signed-off-by: Dave Airlie <airlied at redhat.com>
     From: Alex Deucher <alexander.deucher at amd.com>
     Link: 
https://patchwork.freedesktop.org/patch/msgid/20210827192336.4649-1-alexander.deucher@amd.com

commit 61d861cf478576d85d6032f864360a34b26084b1 (HEAD)
Author: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>
Date:   Wed May 13 11:58:50 2020 -0400

     drm/amd/display: Move AllowDRAMSelfRefreshOrDRAMClockChangeInVblank 
to bounding box

     [Why]
     This is a global parameter, not a per pipe parameter and it's useful
     for experimenting with the prefetch schedule to be adjustable from
     the SOC bb.

     [How]
     Add a parameter to the SOC bb, default is the existing policy for
     all DCN. Fill it in when filling SOC bb parameters.

     Revert the policy to use MinDCFClk at the same time since that's not
     going to give us P-State in most cases on the spreadsheet.

     Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1403
     Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>
     Signed-off-by: Aurabindo Pillai <aurabindo.pillai at amd.com>
     Tested-by: Daniel Wheeler <Daniel.Wheeler at amd.com>
     Acked-by: Alex Deucher <alexander.deucher at amd.com>
     Signed-off-by: Alex Deucher <alexander.deucher at amd.com>

I have been running 5.15.4 with 61d861cf478576d85d6032f864360a34b26084b1 
backed out for a few hours with multiple periods of power saving, and so 
far so good.

Cheers,

Mark




More information about the amd-gfx mailing list