[PATCH] drm/amdgpu: Ignore first evction failure during suspend
Pan, Xinhui
Xinhui.Pan at amd.com
Tue Sep 12 00:21:42 UTC 2023
[AMD Official Use Only - General]
Oh yep, Pinned BO is moved to other LRU list, So eviction fails because of other reason.
I will change the comments in the patch.
The problem is eviction fails as many reasons, say, BO is locked.
ASAIK, kfd will stop the queues and flush some evict/restore work in its suspend callback. SO the first eviction before kfd callback likely fails.
-----Original Message-----
From: Christian König <ckoenig.leichtzumerken at gmail.com>
Sent: Friday, September 8, 2023 2:49 PM
To: Pan, Xinhui <Xinhui.Pan at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Fan, Shikang <Shikang.Fan at amd.com>
Subject: Re: [PATCH] drm/amdgpu: Ignore first evction failure during suspend
Am 08.09.23 um 05:39 schrieb xinhui pan:
> Some BOs might be pinned. So the first eviction's failure will abort
> the suspend sequence. These pinned BOs will be unpined afterwards
> during suspend.
That doesn't make much sense since pinned BOs don't cause eviction failure here.
What exactly is the error code you see?
Christian.
>
> Actaully it has evicted most BOs, so that should stil work fine in
> sriov full access mode.
>
> Fixes: 47ea20762bb7 ("drm/amdgpu: Add an extra evict_resource call
> during device_suspend.")
> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 5c0e2b766026..39af526cdbbe 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4148,10 +4148,11 @@ int amdgpu_device_suspend(struct drm_device
> *dev, bool fbcon)
>
> adev->in_suspend = true;
>
> - /* Evict the majority of BOs before grabbing the full access */
> - r = amdgpu_device_evict_resources(adev);
> - if (r)
> - return r;
> + /* Try to evict the majority of BOs before grabbing the full access
> + * Ignore the ret val at first place as we will unpin some BOs if any
> + * afterwards.
> + */
> + (void)amdgpu_device_evict_resources(adev);
>
> if (amdgpu_sriov_vf(adev)) {
> amdgpu_virt_fini_data_exchange(adev);
More information about the amd-gfx
mailing list