[PATCH] drm/amdgpu: always reset asic when going into suspend

Tue Oct 15 18:42:48 UTC 2019

On Tue, Oct 15, 2019 at 2:50 AM Daniel Drake <drake at endlessm.com> wrote:
>
> On Asus UX434DA (Ryzen7 3700U), upon resume from s2idle, the screen
> turns on again and shows the pre-suspend image, but the display remains
> frozen from that point onwards.
>
> The kernel logs show errors:
>
>  [drm] psp command failed and response status is (0x7)
>  [drm] Fence fallback timer expired on ring sdma0
>  [drm] Fence fallback timer expired on ring gfx
>  amdgpu 0000:03:00.0: [drm:amdgpu_ib_ring_tests] *ERROR* IB test failed on gfx (-22).
>  [drm:process_one_work] *ERROR* ib ring test failed (-22).
>
> This can also be reproduced with pm_test:
>  # echo devices > /sys/power/pm_test
>  # echo freeze > /sys/power/mem
>
> The same reproducer causes the same problem on Asus X512DK (Ryzen5 3500U)
> even though that model is normally able to suspend and resume OK via S3.
>
> Experimenting, I observed that this error condition can be invoked on
> any amdgpu product by executing in succession:
>
>   amdgpu_device_suspend(drm_dev, true, true);
>   amdgpu_device_resume(drm_dev, true, true);
>
> i.e. it appears that the resume routine is unable to get the device out
> of suspended state, except for the S3 suspend case where it presumably has
> a bit of extra help from the firmware or hardware.
>
> However, I also observed that the runtime suspend/resume routines work
> OK when tested like this, which lead me to the key difference in these
> two cases: the ASIC reset, which only happens in the runtime suspend path.
>
> Since it takes less than 1ms, we should do the ASIC reset in all
> suspend paths, fixing resume from s2idle on these products.
>

Is s2idle actually powering down the GPU?  Do you see a difference in
power usage?  I think you are just working around the fact that the
GPU never actually gets powered down.  Leaving the GPU in the reset
state probably uses more power than not suspending it in the first
place.

Alex

> Link: https://bugs.freedesktop.org/show_bug.cgi?id=111811
> Signed-off-by: Daniel Drake <drake at endlessm.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 5a1939dbd4e3..7f4870e974fb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3082,15 +3082,16 @@ int amdgpu_device_suspend(struct drm_device *dev, bool suspend, bool fbcon)
>          */
>         amdgpu_bo_evict_vram(adev);
>
> +       amdgpu_asic_reset(adev);
> +       r = amdgpu_asic_reset(adev);
> +       if (r)
> +               DRM_ERROR("amdgpu asic reset failed\n");
> +
>         pci_save_state(dev->pdev);
>         if (suspend) {
>                 /* Shut down the device */
>                 pci_disable_device(dev->pdev);
>                 pci_set_power_state(dev->pdev, PCI_D3hot);
> -       } else {
> -               r = amdgpu_asic_reset(adev);
> -               if (r)
> -                       DRM_ERROR("amdgpu asic reset failed\n");
>         }
>
>         return 0;
> --
> 2.20.1
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx