[PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry
Lazar, Lijo
lijo.lazar at amd.com
Wed May 17 05:07:04 UTC 2023
On 5/17/2023 10:25 AM, Limonciello, Mario wrote:
>
> On 5/16/2023 11:43 PM, Lazar, Lijo wrote:
>>
>>
>> On 5/17/2023 5:04 AM, Mario Limonciello wrote:
>>> DCN 3.1.4 s2idle entry will hang
>>> occasionally on s2idle entry, but only if running Wayland and only
>>> when using `systemctl suspend`, not `echo mem | tee /sys/power/state`.
>>>
>>> This happens because using `systemctl suspend` will cause the screen
>>> to lock right before writing mem into /sys/power/state.
>>>
>>
>> A couple of things on this since this mentions systemctl suspend -
>>
>> 1) If in s2idle, it's supposed to immediately signal and not schedule
>> delayed work
>>
>> 3964b0c2e843334858da99db881859faa4df241d drm/amdgpu: complete gfxoff
>> allow signal during suspend without delay
>
> It looks like dead code to me now actually.
>
> amdgpu_device_set_pg_state() skips GFX, so gfxoff control never gets
> called as part of suspend path.
>
Ok, that means schedule happened sometime before. Can we remove this
code also as there is a flush anyway with patch 1? Also, is there a need
to call GFXOFF forcefully on S0ix suspend (any chance that gfxoff is not
scheduled)?
>>
>> 2) RLC is never stopped on GFX 10 or greater.
>>
> System was hanging before this series.
>
> Patch 3 "alone" matches this behavior as described above to skip RLC
> suspend but two problems happen:
>
> 1) GFXOFF workqueue doesn't get flushed and so driver's request for
> GFXOFF can happen at wrong time.
>
> 2) If suspend entry happens before GFXOFF is really asserted lots of
> errors on resume. IE:
>
Is patch 3 really required? Does it make any difference?
Thanks,
Lijo
> [ 63.095227] [drm] Fence fallback timer expired on ring sdma0
> [ 63.098360] [drm] ring gfx_32772.1.1 was added
> [ 63.099439] [drm] ring compute_32772.2.2 was added
> [ 63.100460] [drm] ring sdma_32772.3.3 was added
> [ 63.100504] [drm] ring gfx_32772.1.1 test pass
> [ 63.607166] [drm] Fence fallback timer expired on ring gfx_32772.1.1
> [ 63.607234] [drm] ring gfx_32772.1.1 ib test pass
> [ 63.608964] [drm] ring compute_32772.2.2 test pass
> [ 64.119173] [drm] Fence fallback timer expired on ring compute_32772.2.2
> [ 64.119219] [drm] ring compute_32772.2.2 ib test pass
> [ 64.121364] [drm] ring sdma_32772.3.3 test pass
> [ 64.631422] [drm] Fence fallback timer expired on ring sdma_32772.3.3
> [ 64.631465] [drm] ring sdma_32772.3.3 ib test pass
> [ 65.143184] [drm] Fence fallback timer expired on ring sdma0
>
>> Wondering if the code hides something else because of the timing.
>> Thanks,
>> Lijo
>>
>>> This causes a delayed GFXOFF entry to be scheduled right before s2idle
>>> entry. If the workqueue doesn't get processed before the RLC is turned
>>> off the system is hung. Even if the workqueue *does* get processed,
>>> there
>>> is a race between the APU microcontrollers and driver for whether GFX
>>> is actually powered off when RLC is turned off.
>>>
>>> To avoid this issue, flush the workqueue on s2idle entry and ensure that
>>> GFX is really in GFXOFF before any sensitive register accesses occur.
>>>
>>> Mario Limonciello (3):
>>> drm/amd: Flush any delayed gfxoff on suspend entry
>>> drm/amd: Poll for GFX core to be off
>>> drm/amd: Skip RLC suspend for s0ix on PSP 13.0.4 and 13.0.11
>>>
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 25 ++++++++++++++++++++++
>>> drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 18 ++++++++++++++++
>>> drivers/gpu/drm/amd/include/amd_shared.h | 1 +
>>> drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 4 ++--
>>> 4 files changed, 46 insertions(+), 2 deletions(-)
>>>
More information about the amd-gfx
mailing list