[PATCH 4/4] drm/amdgpu: free the evf when the attached bo release
Liang, Prike
Prike.Liang at amd.com
Tue Apr 22 13:09:46 UTC 2025
[AMD Official Use Only - AMD Internal Distribution Only]
> From: Yadav, Arvind <Arvind.Yadav at amd.com>
> Sent: Tuesday, April 22, 2025 8:40 PM
> To: Koenig, Christian <Christian.Koenig at amd.com>; Liang, Prike
> <Prike.Liang at amd.com>
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; amd-
> gfx at lists.freedesktop.org; Christian König <ckoenig.leichtzumerken at gmail.com>
> Subject: Re: [PATCH 4/4] drm/amdgpu: free the evf when the attached bo release
>
>
> On 4/22/2025 2:57 PM, Christian König wrote:
> > Am 22.04.25 um 11:14 schrieb Liang, Prike:
> >> [Public]
> >>
> >>> -----Original Message-----
> >>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
> >>> Sent: Thursday, April 17, 2025 3:40 PM
> >>> To: Liang, Prike <Prike.Liang at amd.com>; Koenig, Christian
> >>> <Christian.Koenig at amd.com>; amd-gfx at lists.freedesktop.org
> >>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
> >>> Subject: Re: [PATCH 4/4] drm/amdgpu: free the evf when the attached
> >>> bo release
> >>>
> >>> Am 16.04.25 um 16:47 schrieb Liang, Prike:
> >>>> [Public]
> >>>>
> >>>>> From: Koenig, Christian <Christian.Koenig at amd.com>
> >>>>> Sent: Wednesday, April 16, 2025 7:07 PM
> >>>>> To: Liang, Prike <Prike.Liang at amd.com>;
> >>>>> amd-gfx at lists.freedesktop.org
> >>>>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
> >>>>> Subject: Re: [PATCH 4/4] drm/amdgpu: free the evf when the
> >>>>> attached bo release
> >>>>>
> >>>>> Am 16.04.25 um 10:50 schrieb Prike Liang:
> >>>>>> Free the evf when the attached bo released. The evf still be
> >>>>>> dependent on and referred to by the attached bo that is scheduled
> >>>>>> by the kernel queue SDMA or gfx after the evf signalled.
> >>>>>>
> >>>>>> Signed-off-by: Prike Liang <Prike.Liang at amd.com>
> >>>>>> ---
> >>>>>> .../drm/amd/amdgpu/amdgpu_eviction_fence.c | 31
> ++++++++++++++++--
> >>> -
> >>>>>> .../drm/amd/amdgpu/amdgpu_eviction_fence.h | 1 +
> >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 1 +
> >>>>>> 3 files changed, 28 insertions(+), 5 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> >>>>>> index b34225bbd85d..60be1ac5047d 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> >>>>>> @@ -27,6 +27,7 @@
> >>>>>>
> >>>>>> #define work_to_evf_mgr(w, name) container_of(w, struct
> >>>>>> amdgpu_eviction_fence_mgr, name) #define evf_mgr_to_fpriv(e)
> >>>>>> container_of(e, struct amdgpu_fpriv, evf_mgr)
> >>>>>> +#define fence_to_evf(f) container_of(f, struct
> >>>>>> +amdgpu_eviction_fence, base)
> >>>>>>
> >>>>>> static const char *
> >>>>>> amdgpu_eviction_fence_get_driver_name(struct dma_fence *fence)
> >>>>>> @@
> >>>>>> -47,7 +48,7 @@ int amdgpu_eviction_fence_replace_fence(struct
> >>>>>> amdgpu_eviction_fence_mgr *evf_mgr,
> >>>>>> struct drm_exec *exec) {
> >>>>>> - struct amdgpu_eviction_fence *old_ef, *new_ef;
> >>>>>> + struct amdgpu_eviction_fence *new_ef;
> >>>>>> struct drm_gem_object *obj;
> >>>>>> unsigned long index;
> >>>>>> int ret;
> >>>>>> @@ -72,7 +73,6 @@ amdgpu_eviction_fence_replace_fence(struct
> >>>>>> amdgpu_eviction_fence_mgr *evf_mgr,
> >>>>>>
> >>>>>> /* Update the eviction fence now */
> >>>>>> spin_lock(&evf_mgr->ev_fence_lock);
> >>>>>> - old_ef = evf_mgr->ev_fence;
> >>>>>> evf_mgr->ev_fence = new_ef;
> >>>>>> spin_unlock(&evf_mgr->ev_fence_lock);
> >>>>>>
> >>>>>> @@ -102,9 +102,6 @@ amdgpu_eviction_fence_replace_fence(struct
> >>>>> amdgpu_eviction_fence_mgr *evf_mgr,
> >>>>>> }
> >>>>>> }
> >>>>>>
> >>>>>> - /* Free old fence */
> >>>>>> - if (old_ef)
> >>>>>> - dma_fence_put(&old_ef->base);
> >>>>> That change looks completely incorrect to me, you will now leak the old
> fence.
> >>>> The eviction fence is attached and shared by all the restored
> >>>> validated VM BOs
> >>> during UQ restore, and at this placement the eviction fence is only
> >>> detached from one of the BOs. Using
> >>> amdgpu_userq_remove_all_eviction_fences() will walk over the resv objects
> and detach the fence from the resv objs when freeing the BO.
> >>>
> >>> Yeah, but that doesn't justify this change here. See you're
> >>> completely messing up the fence reference count with that.
> >>>
> >>>> But there's a problem: even though dropping all the evf attached to
> >>>> VM BOs with
> >>> this patch, the evf still referred to by the SDMA and GFX kernel
> >>> queue jobs at the case when enabling the kq and uq at the same time.
> Thoughts?
> >>>
> >>> Mhm, the eviction fence is always added as bookmark isn't it? As
> >>> long as the GFX and SDMA jobs are not for evicting something then
> >>> they should only depend on fences with usage < bookmark.
> >>>
> >>> Can you dig up when they are added to the dependencies of the job?
> >> When the eviction fence was added to the user queue VM BOs
> >> reservation and then updated the BO page table, which will add the
> >> eviction fence to the VM sync at amdgpu_sync_resv()
> > Stop, wait a second. That shouldn't happen at the first place. Why is the eviction
> fence considered a dependency for page table updates?
> >
> > When it is added only as bookkeep then we should never consider that here.
> Looks like something in the sync obj is messed up.
> It is like this. Here, amdgpu_sync_resv is using
> DMA_RESV_USAGE_BOOKKEEP.
>
> int amdgpu_sync_resv() {
>
> ..
>
> /* TODO: Use DMA_RESV_USAGE_READ here */
> dma_resv_for_each_fence(&cursor, resv,
> DMA_RESV_USAGE_BOOKKEEP, f) {
> dma_fence_chain_for_each(f, f) {
>
> ..
>
> }
> during PT update amdgpu_vm_bo_update() is using sync to moving
> fences(Eviction fence) before mapping anything. Because of this Eviction fence will
> act as dependency.
Yes, since the amdgpu_sync_resv() uses the bookkeep usage, then all the BOs reservation fences along with the eviction fence will be returned and added to the sync.
And with the attached patch, the eviction fence can be released properly when the kq and uq are enabled.
Thanks,
Prike
> ~arvind
>
> > Regards,
> > Christian.
> >
> >> , and then the eviction fence will be added as a dependent fence by
> propagating with amdgpu_sync_push_to_job(). With removing the eviction fence
> from the VM sync at amdgpu_sync_resv(), then the eviction fence can be released
> properly.
> >>
> >> Thanks,
> >> Prike
> >>> Thanks,
> >>> Christian.
> >>>
> >>> PS: Please stop calling the eviction fence evf.
> >>>
> >>>>>> return 0;
> >>>>>>
> >>>>>> free_err:
> >>>>>> @@ -237,6 +234,30 @@ void amdgpu_eviction_fence_detach(struct
> >>>>> amdgpu_eviction_fence_mgr *evf_mgr,
> >>>>>> dma_fence_put(stub);
> >>>>>> }
> >>>>>>
> >>>>>> +void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo
> >>>>>> +*bo)
> >>>>> Please name that amdgpu_eviction_fence_remove_all().
> >>>> Noted.
> >>>>
> >>>>> Regards,
> >>>>> Christian.
> >>>>>
> >>>>>> +{
> >>>>>> + struct dma_resv *resv = &bo->tbo.base._resv;
> >>>>>> + struct dma_fence *fence, *stub;
> >>>>>> + struct dma_resv_iter cursor;
> >>>>>> +
> >>>>>> + dma_resv_assert_held(resv);
> >>>>>> +
> >>>>>> + stub = dma_fence_get_stub();
> >>>>>> + dma_resv_for_each_fence(&cursor, resv,
> >>>>> DMA_RESV_USAGE_BOOKKEEP, fence) {
> >>>>>> + struct amdgpu_eviction_fence *ev_fence;
> >>>>>> +
> >>>>>> + ev_fence = fence_to_evf(fence);
> >>>>>> + if (!ev_fence || !dma_fence_is_signaled(&ev_fence->base))
> >>>>>> + continue;
> >>>>>> +
> >>>>>> + dma_resv_replace_fences(resv, fence->context, stub,
> >>>>>> + DMA_RESV_USAGE_BOOKKEEP);
> >>>>>> +
> >>>>>> + }
> >>>>>> +
> >>>>>> + dma_fence_put(stub);
> >>>>>> +}
> >>>>>> +
> >>>>>> int amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr
> >>>>>> *evf_mgr) {
> >>>>>> /* This needs to be done one time per open */ diff --git
> >>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> >>>>>> index fcd867b7147d..da99ac322a2e 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> >>>>>> @@ -66,4 +66,5 @@ amdgpu_eviction_fence_signal(struct
> >>>>>> amdgpu_eviction_fence_mgr *evf_mgr, int
> >>>>>> amdgpu_eviction_fence_replace_fence(struct
> >>>>>> amdgpu_eviction_fence_mgr
> >>>>> *evf_mgr,
> >>>>>> struct drm_exec *exec);
> >>>>>> +void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo
> >>>>>> +*bo);
> >>>>>> #endif
> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>>>>> index 1e73ce30d4d7..f001018a01eb 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>>>>> @@ -1392,6 +1392,7 @@ void amdgpu_bo_release_notify(struct
> >>>>> ttm_buffer_object *bo)
> >>>>>> amdgpu_vram_mgr_set_cleared(bo->resource);
> >>>>>> dma_resv_add_fence(&bo->base._resv, fence,
> >>>>> DMA_RESV_USAGE_KERNEL);
> >>>>>> dma_fence_put(fence);
> >>>>>> + amdgpu_userq_remove_all_eviction_fences(abo);
> >>>>>>
> >>>>>> out:
> >>>>>> dma_resv_unlock(&bo->base._resv);
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-drm-amdgpu-free-the-evf-when-the-attached-bo-release.patch
Type: application/octet-stream
Size: 6072 bytes
Desc: 0003-drm-amdgpu-free-the-evf-when-the-attached-bo-release.patch
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20250422/75ad7790/attachment-0001.obj>
More information about the amd-gfx
mailing list