[PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry

Lazar, Lijo lijo.lazar at amd.com
Wed May 17 05:07:04 UTC 2023



On 5/17/2023 10:25 AM, Limonciello, Mario wrote:
> 
> On 5/16/2023 11:43 PM, Lazar, Lijo wrote:
>>
>>
>> On 5/17/2023 5:04 AM, Mario Limonciello wrote:
>>> DCN 3.1.4 s2idle entry will hang
>>> occasionally on s2idle entry, but only if running Wayland and only
>>> when using `systemctl suspend`, not `echo mem | tee /sys/power/state`.
>>>
>>> This happens because using `systemctl suspend` will cause the screen
>>> to lock right before writing mem into /sys/power/state.
>>>
>>
>> A couple of things on this since this mentions systemctl suspend -
>>
>> 1) If in s2idle, it's supposed to immediately signal and not schedule 
>> delayed work
>>
>> 3964b0c2e843334858da99db881859faa4df241d drm/amdgpu: complete gfxoff 
>> allow signal during suspend without delay
> 
> It looks like dead code to me now actually.
> 
> amdgpu_device_set_pg_state() skips GFX, so gfxoff control never gets 
> called as part of suspend path.
> 

Ok, that means schedule happened sometime before. Can we remove this 
code also as there is a flush anyway with patch 1? Also, is there a need 
to call GFXOFF forcefully on S0ix suspend (any chance that gfxoff is not 
scheduled)?

>>
>> 2) RLC is never stopped on GFX 10 or greater.
>>
> System was hanging before this series.
> 
> Patch 3 "alone" matches this behavior as described above to skip RLC 
> suspend but two problems happen:
> 
> 1) GFXOFF workqueue doesn't get flushed and so driver's request for 
> GFXOFF can happen at wrong time.
> 
> 2) If suspend entry happens before GFXOFF is really asserted lots of 
> errors on resume. IE:
> 

Is patch 3 really required?  Does it make any difference?

Thanks,
Lijo

> [   63.095227] [drm] Fence fallback timer expired on ring sdma0
> [   63.098360] [drm] ring gfx_32772.1.1 was added
> [   63.099439] [drm] ring compute_32772.2.2 was added
> [   63.100460] [drm] ring sdma_32772.3.3 was added
> [   63.100504] [drm] ring gfx_32772.1.1 test pass
> [   63.607166] [drm] Fence fallback timer expired on ring gfx_32772.1.1
> [   63.607234] [drm] ring gfx_32772.1.1 ib test pass
> [   63.608964] [drm] ring compute_32772.2.2 test pass
> [   64.119173] [drm] Fence fallback timer expired on ring compute_32772.2.2
> [   64.119219] [drm] ring compute_32772.2.2 ib test pass
> [   64.121364] [drm] ring sdma_32772.3.3 test pass
> [   64.631422] [drm] Fence fallback timer expired on ring sdma_32772.3.3
> [   64.631465] [drm] ring sdma_32772.3.3 ib test pass
> [   65.143184] [drm] Fence fallback timer expired on ring sdma0
> 
>> Wondering if the code hides something else because of the timing.
>> Thanks,
>> Lijo
>>
>>> This causes a delayed GFXOFF entry to be scheduled right before s2idle
>>> entry.  If the workqueue doesn't get processed before the RLC is turned
>>> off the system is hung. Even if the workqueue *does* get processed, 
>>> there
>>> is a race between the APU microcontrollers and driver for whether GFX
>>> is actually powered off when RLC is turned off.
>>>
>>> To avoid this issue, flush the workqueue on s2idle entry and ensure that
>>> GFX is really in GFXOFF before any sensitive register accesses occur.
>>>
>>> Mario Limonciello (3):
>>>    drm/amd: Flush any delayed gfxoff on suspend entry
>>>    drm/amd: Poll for GFX core to be off
>>>    drm/amd: Skip RLC suspend for s0ix on PSP 13.0.4 and 13.0.11
>>>
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 25 ++++++++++++++++++++++
>>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c     | 18 ++++++++++++++++
>>>   drivers/gpu/drm/amd/include/amd_shared.h   |  1 +
>>>   drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  4 ++--
>>>   4 files changed, 46 insertions(+), 2 deletions(-)
>>>


More information about the amd-gfx mailing list