[PATCH 4/4] drm/amdgpu: free the evf when the attached bo release

Liang, Prike Prike.Liang at amd.com
Wed Apr 23 03:25:45 UTC 2025


[Public]

From: Koenig, Christian <Christian.Koenig at amd.com>
Sent: Tuesday, April 22, 2025 9:26 PM
To: Liang, Prike <Prike.Liang at amd.com>; Yadav, Arvind <Arvind.Yadav at amd.com>
Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org; Christian König <ckoenig.leichtzumerken at gmail.com>
Subject: Re: [PATCH 4/4] drm/amdgpu: free the evf when the attached bo release

Am 22.04.25 um 15:09 schrieb Liang, Prike:


Stop, wait a second. That shouldn't happen at the first place. Why is the eviction

fence considered a dependency for page table updates?



When it is added only as bookkeep then we should never consider that here.

Looks like something in the sync obj is messed up.

It is like this. Here, amdgpu_sync_resv is using

DMA_RESV_USAGE_BOOKKEEP.



         int amdgpu_sync_resv() {



                 ..



                 /* TODO: Use DMA_RESV_USAGE_READ here */

That here is the core of the problem.

I've added this TODO item 4 years ago to switch over to DMA_RESV_USAGE_READ here when moved all TTM use cases to using drm_sched_job_add_resv_dependencies().

That was done, but this TODO here forgotten.

So here the user space submission sync only requires syncing the fences with less than DMA_RESV_USAGE_READ usage in the amdgpu_sync_resv(). If so, then the eviction fence will not be added to the sync with kernel queue submission. I can submit a patch for this change.

Thanks,
Prike



                 dma_resv_for_each_fence(&cursor, resv,

DMA_RESV_USAGE_BOOKKEEP, f) {

                     dma_fence_chain_for_each(f, f) {



                 ..



}

during PT update amdgpu_vm_bo_update() is using sync to moving

fences(Eviction fence) before mapping anything. Because of this Eviction fence will

act as dependency.



Yes, since the amdgpu_sync_resv() uses the bookkeep usage, then all the BOs reservation fences along with the eviction fence will be returned and added to the sync.

Yeah, but that is incorrect.



And with the attached patch, the eviction fence can be released properly when the kq and uq are enabled.

We need to fix the underlying bug first before we can work on the next step.

Regards,
Christian.







Thanks,

Prike



~arvind



Regards,

Christian.



, and then the eviction fence will be added as a dependent fence by

propagating with amdgpu_sync_push_to_job(). With removing the eviction fence

from the VM sync at amdgpu_sync_resv(), then the eviction fence can be released

properly.



Thanks,

Prike

Thanks,

Christian.



PS: Please stop calling the eviction fence evf.



     return 0;



  free_err:

@@ -237,6 +234,30 @@ void amdgpu_eviction_fence_detach(struct

amdgpu_eviction_fence_mgr *evf_mgr,

     dma_fence_put(stub);

  }



+void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo

+*bo)

Please name that amdgpu_eviction_fence_remove_all().

Noted.



Regards,

Christian.



+{

+   struct dma_resv *resv = &bo->tbo.base._resv;

+   struct dma_fence *fence, *stub;

+   struct dma_resv_iter cursor;

+

+   dma_resv_assert_held(resv);

+

+   stub = dma_fence_get_stub();

+   dma_resv_for_each_fence(&cursor, resv,

DMA_RESV_USAGE_BOOKKEEP, fence) {

+           struct amdgpu_eviction_fence *ev_fence;

+

+           ev_fence = fence_to_evf(fence);

+           if (!ev_fence || !dma_fence_is_signaled(&ev_fence->base))

+                   continue;

+

+           dma_resv_replace_fences(resv, fence->context, stub,

+                           DMA_RESV_USAGE_BOOKKEEP);

+

+   }

+

+   dma_fence_put(stub);

+}

+

  int amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr

*evf_mgr)  {

     /* This needs to be done one time per open */ diff --git

a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h

b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h

index fcd867b7147d..da99ac322a2e 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h

@@ -66,4 +66,5 @@ amdgpu_eviction_fence_signal(struct

amdgpu_eviction_fence_mgr *evf_mgr,  int

amdgpu_eviction_fence_replace_fence(struct

amdgpu_eviction_fence_mgr

*evf_mgr,

                                 struct drm_exec *exec);

+void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo

+*bo);

  #endif

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

index 1e73ce30d4d7..f001018a01eb 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

@@ -1392,6 +1392,7 @@ void amdgpu_bo_release_notify(struct

ttm_buffer_object *bo)

     amdgpu_vram_mgr_set_cleared(bo->resource);

     dma_resv_add_fence(&bo->base._resv, fence,

DMA_RESV_USAGE_KERNEL);

     dma_fence_put(fence);

+   amdgpu_userq_remove_all_eviction_fences(abo);



  out:

     dma_resv_unlock(&bo->base._resv);

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20250423/1792c294/attachment-0001.htm>


More information about the amd-gfx mailing list