[PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry

Limonciello, Mario mlimonci at amd.com
Wed May 17 04:55:26 UTC 2023


On 5/16/2023 11:43 PM, Lazar, Lijo wrote:
>
>
> On 5/17/2023 5:04 AM, Mario Limonciello wrote:
>> DCN 3.1.4 s2idle entry will hang
>> occasionally on s2idle entry, but only if running Wayland and only
>> when using `systemctl suspend`, not `echo mem | tee /sys/power/state`.
>>
>> This happens because using `systemctl suspend` will cause the screen
>> to lock right before writing mem into /sys/power/state.
>>
>
> A couple of things on this since this mentions systemctl suspend -
>
> 1) If in s2idle, it's supposed to immediately signal and not schedule 
> delayed work
>
> 3964b0c2e843334858da99db881859faa4df241d drm/amdgpu: complete gfxoff 
> allow signal during suspend without delay

It looks like dead code to me now actually.

amdgpu_device_set_pg_state() skips GFX, so gfxoff control never gets 
called as part of suspend path.

>
> 2) RLC is never stopped on GFX 10 or greater.
>
System was hanging before this series.

Patch 3 "alone" matches this behavior as described above to skip RLC 
suspend but two problems happen:

1) GFXOFF workqueue doesn't get flushed and so driver's request for 
GFXOFF can happen at wrong time.

2) If suspend entry happens before GFXOFF is really asserted lots of 
errors on resume. IE:

[   63.095227] [drm] Fence fallback timer expired on ring sdma0
[   63.098360] [drm] ring gfx_32772.1.1 was added
[   63.099439] [drm] ring compute_32772.2.2 was added
[   63.100460] [drm] ring sdma_32772.3.3 was added
[   63.100504] [drm] ring gfx_32772.1.1 test pass
[   63.607166] [drm] Fence fallback timer expired on ring gfx_32772.1.1
[   63.607234] [drm] ring gfx_32772.1.1 ib test pass
[   63.608964] [drm] ring compute_32772.2.2 test pass
[   64.119173] [drm] Fence fallback timer expired on ring compute_32772.2.2
[   64.119219] [drm] ring compute_32772.2.2 ib test pass
[   64.121364] [drm] ring sdma_32772.3.3 test pass
[   64.631422] [drm] Fence fallback timer expired on ring sdma_32772.3.3
[   64.631465] [drm] ring sdma_32772.3.3 ib test pass
[   65.143184] [drm] Fence fallback timer expired on ring sdma0

> Wondering if the code hides something else because of the timing.
> Thanks,
> Lijo
>
>> This causes a delayed GFXOFF entry to be scheduled right before s2idle
>> entry.  If the workqueue doesn't get processed before the RLC is turned
>> off the system is hung. Even if the workqueue *does* get processed, 
>> there
>> is a race between the APU microcontrollers and driver for whether GFX
>> is actually powered off when RLC is turned off.
>>
>> To avoid this issue, flush the workqueue on s2idle entry and ensure that
>> GFX is really in GFXOFF before any sensitive register accesses occur.
>>
>> Mario Limonciello (3):
>>    drm/amd: Flush any delayed gfxoff on suspend entry
>>    drm/amd: Poll for GFX core to be off
>>    drm/amd: Skip RLC suspend for s0ix on PSP 13.0.4 and 13.0.11
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 25 ++++++++++++++++++++++
>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c     | 18 ++++++++++++++++
>>   drivers/gpu/drm/amd/include/amd_shared.h   |  1 +
>>   drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  4 ++--
>>   4 files changed, 46 insertions(+), 2 deletions(-)
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20230516/3e9bfcc3/attachment-0001.htm>


More information about the amd-gfx mailing list