[PATCH 4/4] drm/amdgpu: free the evf when the attached bo release

Liang, Prike Prike.Liang at amd.com
Tue Apr 22 09:14:55 UTC 2025


[Public]

> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken at gmail.com>
> Sent: Thursday, April 17, 2025 3:40 PM
> To: Liang, Prike <Prike.Liang at amd.com>; Koenig, Christian
> <Christian.Koenig at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
> Subject: Re: [PATCH 4/4] drm/amdgpu: free the evf when the attached bo release
>
> Am 16.04.25 um 16:47 schrieb Liang, Prike:
> > [Public]
> >
> >> From: Koenig, Christian <Christian.Koenig at amd.com>
> >> Sent: Wednesday, April 16, 2025 7:07 PM
> >> To: Liang, Prike <Prike.Liang at amd.com>; amd-gfx at lists.freedesktop.org
> >> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
> >> Subject: Re: [PATCH 4/4] drm/amdgpu: free the evf when the attached
> >> bo release
> >>
> >> Am 16.04.25 um 10:50 schrieb Prike Liang:
> >>> Free the evf when the attached bo released. The evf still be
> >>> dependent on and referred to by the attached bo that is scheduled by
> >>> the kernel queue SDMA or gfx after the evf signalled.
> >>>
> >>> Signed-off-by: Prike Liang <Prike.Liang at amd.com>
> >>> ---
> >>>  .../drm/amd/amdgpu/amdgpu_eviction_fence.c    | 31 ++++++++++++++++--
> -
> >>>  .../drm/amd/amdgpu/amdgpu_eviction_fence.h    |  1 +
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |  1 +
> >>>  3 files changed, 28 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> >>> index b34225bbd85d..60be1ac5047d 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> >>> @@ -27,6 +27,7 @@
> >>>
> >>>  #define work_to_evf_mgr(w, name) container_of(w, struct
> >>> amdgpu_eviction_fence_mgr, name)  #define evf_mgr_to_fpriv(e)
> >>> container_of(e, struct amdgpu_fpriv, evf_mgr)
> >>> +#define fence_to_evf(f)  container_of(f, struct
> >>> +amdgpu_eviction_fence, base)
> >>>
> >>>  static const char *
> >>>  amdgpu_eviction_fence_get_driver_name(struct dma_fence *fence) @@
> >>> -47,7 +48,7 @@ int  amdgpu_eviction_fence_replace_fence(struct
> >>> amdgpu_eviction_fence_mgr *evf_mgr,
> >>>                                 struct drm_exec *exec)  {
> >>> -   struct amdgpu_eviction_fence *old_ef, *new_ef;
> >>> +   struct amdgpu_eviction_fence *new_ef;
> >>>     struct drm_gem_object *obj;
> >>>     unsigned long index;
> >>>     int ret;
> >>> @@ -72,7 +73,6 @@ amdgpu_eviction_fence_replace_fence(struct
> >>> amdgpu_eviction_fence_mgr *evf_mgr,
> >>>
> >>>     /* Update the eviction fence now */
> >>>     spin_lock(&evf_mgr->ev_fence_lock);
> >>> -   old_ef = evf_mgr->ev_fence;
> >>>     evf_mgr->ev_fence = new_ef;
> >>>     spin_unlock(&evf_mgr->ev_fence_lock);
> >>>
> >>> @@ -102,9 +102,6 @@ amdgpu_eviction_fence_replace_fence(struct
> >> amdgpu_eviction_fence_mgr *evf_mgr,
> >>>             }
> >>>     }
> >>>
> >>> -   /* Free old fence */
> >>> -   if (old_ef)
> >>> -           dma_fence_put(&old_ef->base);
> >> That change looks completely incorrect to me, you will now leak the old fence.
> > The eviction fence is attached and shared by all the restored validated VM BOs
> during UQ restore, and at this placement the eviction fence is only detached from
> one of the BOs. Using amdgpu_userq_remove_all_eviction_fences() will walk over
> the resv objects and detach the fence from the resv objs when freeing the BO.
>
> Yeah, but that doesn't justify this change here. See you're completely messing up
> the fence reference count with that.
>
> >
> > But there's a problem: even though dropping all the evf attached to VM BOs with
> this patch, the evf still referred to by the SDMA and GFX kernel queue jobs at the
> case when enabling the kq and uq at the same time. Thoughts?
>
> Mhm, the eviction fence is always added as bookmark isn't it? As long as the GFX
> and SDMA jobs are not for evicting something then they should only depend on
> fences with usage < bookmark.
>
> Can you dig up when they are added to the dependencies of the job?

When the eviction fence was added to the user queue VM BOs reservation and then updated the BO page table, which will add the eviction fence to the VM sync at amdgpu_sync_resv(), and then the eviction fence will be added as a dependent fence by propagating with amdgpu_sync_push_to_job(). With removing the eviction fence from the VM sync at amdgpu_sync_resv(), then the eviction fence can be released properly.

Thanks,
Prike
>
> Thanks,
> Christian.
>
> PS: Please stop calling the eviction fence evf.
>
> >
> >>>     return 0;
> >>>
> >>>  free_err:
> >>> @@ -237,6 +234,30 @@ void amdgpu_eviction_fence_detach(struct
> >> amdgpu_eviction_fence_mgr *evf_mgr,
> >>>     dma_fence_put(stub);
> >>>  }
> >>>
> >>> +void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo *bo)
> >> Please name that amdgpu_eviction_fence_remove_all().
> > Noted.
> >
> >> Regards,
> >> Christian.
> >>
> >>> +{
> >>> +   struct dma_resv *resv = &bo->tbo.base._resv;
> >>> +   struct dma_fence *fence, *stub;
> >>> +   struct dma_resv_iter cursor;
> >>> +
> >>> +   dma_resv_assert_held(resv);
> >>> +
> >>> +   stub = dma_fence_get_stub();
> >>> +   dma_resv_for_each_fence(&cursor, resv,
> >> DMA_RESV_USAGE_BOOKKEEP, fence) {
> >>> +           struct amdgpu_eviction_fence *ev_fence;
> >>> +
> >>> +           ev_fence = fence_to_evf(fence);
> >>> +           if (!ev_fence || !dma_fence_is_signaled(&ev_fence->base))
> >>> +                   continue;
> >>> +
> >>> +           dma_resv_replace_fences(resv, fence->context, stub,
> >>> +                           DMA_RESV_USAGE_BOOKKEEP);
> >>> +
> >>> +   }
> >>> +
> >>> +   dma_fence_put(stub);
> >>> +}
> >>> +
> >>>  int amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr
> >>> *evf_mgr)  {
> >>>     /* This needs to be done one time per open */ diff --git
> >>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> >>> index fcd867b7147d..da99ac322a2e 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> >>> @@ -66,4 +66,5 @@ amdgpu_eviction_fence_signal(struct
> >>> amdgpu_eviction_fence_mgr *evf_mgr,  int
> >>> amdgpu_eviction_fence_replace_fence(struct amdgpu_eviction_fence_mgr
> >> *evf_mgr,
> >>>                                 struct drm_exec *exec);
> >>> +void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo *bo);
> >>>  #endif
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>> index 1e73ce30d4d7..f001018a01eb 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>> @@ -1392,6 +1392,7 @@ void amdgpu_bo_release_notify(struct
> >> ttm_buffer_object *bo)
> >>>     amdgpu_vram_mgr_set_cleared(bo->resource);
> >>>     dma_resv_add_fence(&bo->base._resv, fence,
> >> DMA_RESV_USAGE_KERNEL);
> >>>     dma_fence_put(fence);
> >>> +   amdgpu_userq_remove_all_eviction_fences(abo);
> >>>
> >>>  out:
> >>>     dma_resv_unlock(&bo->base._resv);



More information about the amd-gfx mailing list