[PATCH 3/3] drm/amdgpu: validate the eviction fence attach/detach
Christian König
christian.koenig at amd.com
Mon Apr 28 17:51:33 UTC 2025
On 4/25/25 09:07, Prike Liang wrote:
> Before the user queue BOs resume workqueue is scheduled;
> there's no valid eviction fence to attach the gem obj.
> For this case, it doesn't need to attach/detach the eviction
> fence. Also, it needs to unlock the bo first before returning
> from the eviction fence attached error.
>
> Signed-off-by: Prike Liang <Prike.Liang at amd.com>
> ---
> .../gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c | 3 +++
> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 16 ++++++++++------
> 2 files changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> index d2271c10498d..375f15b6fd58 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> @@ -216,6 +216,9 @@ void amdgpu_eviction_fence_detach(struct amdgpu_eviction_fence_mgr *evf_mgr,
> {
> struct dma_fence *stub = dma_fence_get_stub();
>
> + if (dma_fence_is_signaled(&evf_mgr->ev_fence->base))
> + return;
> +
Clear NAK, that is racy. You can only access evf_mgr->ev_fence while holding the spinlock to make sure that it isn't replaced.
> dma_resv_replace_fences(bo->tbo.base.resv, evf_mgr->ev_fence_ctx,
> stub, DMA_RESV_USAGE_BOOKKEEP);
> dma_fence_put(stub);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index c1d8cee7894b..04256de4bee9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -292,11 +292,14 @@ static int amdgpu_gem_object_open(struct drm_gem_object *obj,
> else
> ++bo_va->ref_count;
>
> - /* attach gfx eviction fence */
> - r = amdgpu_eviction_fence_attach(&fpriv->evf_mgr, abo);
That here is buggy, fpriv->evf_mgr can only be accessed while holding the spinlock.
> - if (r) {
> - DRM_DEBUG_DRIVER("Failed to attach eviction fence to BO\n");
> - return r;
> + /* attach gfx the validated eviction fence */
> + if (!IS_ERR_OR_NULL(fpriv->evf_mgr.ev_fence)) {
Please don't use ERR_PTR functions on members.
> + r = amdgpu_eviction_fence_attach(&fpriv->evf_mgr, abo);
> + if (r) {
> + DRM_DEBUG_DRIVER("Failed to attach eviction fence to BO\n");
> + amdgpu_bo_unreserve(abo);
> + return r;
> + }
We should always have a stub fence in fpriv->evf_mgr.ev_fence, so those checks are unnecessary.
Regards,
Christian.
> }
>
> amdgpu_bo_unreserve(abo);
> @@ -362,7 +365,8 @@ static void amdgpu_gem_object_close(struct drm_gem_object *obj,
> goto out_unlock;
> }
>
> - if (!amdgpu_vm_is_bo_always_valid(vm, bo))
> + if (!amdgpu_vm_is_bo_always_valid(vm, bo) &&
> + !IS_ERR_OR_NULL(fpriv->evf_mgr.ev_fence))
> amdgpu_eviction_fence_detach(&fpriv->evf_mgr, bo);
>
> bo_va = amdgpu_vm_bo_find(vm, bo);
More information about the amd-gfx
mailing list