[PATCH] drm/amdgpu: Ignore first evction failure during suspend

Pan, Xinhui Xinhui.Pan at amd.com
Tue Sep 12 00:21:42 UTC 2023


[AMD Official Use Only - General]

Oh yep, Pinned BO is moved to other LRU list, So eviction fails because of other reason.
I will change the comments in the patch.
The problem is eviction fails as many reasons, say, BO is locked.
ASAIK, kfd will stop the queues and flush some evict/restore work in its suspend callback. SO the first eviction before kfd callback likely fails.

-----Original Message-----
From: Christian König <ckoenig.leichtzumerken at gmail.com>
Sent: Friday, September 8, 2023 2:49 PM
To: Pan, Xinhui <Xinhui.Pan at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Fan, Shikang <Shikang.Fan at amd.com>
Subject: Re: [PATCH] drm/amdgpu: Ignore first evction failure during suspend

Am 08.09.23 um 05:39 schrieb xinhui pan:
> Some BOs might be pinned. So the first eviction's failure will abort
> the suspend sequence. These pinned BOs will be unpined afterwards
> during suspend.

That doesn't make much sense since pinned BOs don't cause eviction failure here.

What exactly is the error code you see?

Christian.

>
> Actaully it has evicted most BOs, so that should stil work fine in
> sriov full access mode.
>
> Fixes: 47ea20762bb7 ("drm/amdgpu: Add an extra evict_resource call
> during device_suspend.")
> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 5c0e2b766026..39af526cdbbe 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4148,10 +4148,11 @@ int amdgpu_device_suspend(struct drm_device
> *dev, bool fbcon)
>
>       adev->in_suspend = true;
>
> -     /* Evict the majority of BOs before grabbing the full access */
> -     r = amdgpu_device_evict_resources(adev);
> -     if (r)
> -             return r;
> +     /* Try to evict the majority of BOs before grabbing the full access
> +      * Ignore the ret val at first place as we will unpin some BOs if any
> +      * afterwards.
> +      */
> +     (void)amdgpu_device_evict_resources(adev);
>
>       if (amdgpu_sriov_vf(adev)) {
>               amdgpu_virt_fini_data_exchange(adev);



More information about the amd-gfx mailing list