[PATCH 4/4] drm/amdgpu: free the evf when the attached bo release
Christian König
christian.koenig at amd.com
Tue Apr 22 13:26:07 UTC 2025
Am 22.04.25 um 15:09 schrieb Liang, Prike:
>>> Stop, wait a second. That shouldn't happen at the first place. Why is the eviction
>> fence considered a dependency for page table updates?
>>> When it is added only as bookkeep then we should never consider that here.
>> Looks like something in the sync obj is messed up.
>> It is like this. Here, amdgpu_sync_resv is using
>> DMA_RESV_USAGE_BOOKKEEP.
>>
>> int amdgpu_sync_resv() {
>>
>> ..
>>
>> /* TODO: Use DMA_RESV_USAGE_READ here */
That here is the core of the problem.
I've added this TODO item 4 years ago to switch over to DMA_RESV_USAGE_READ here when moved all TTM use cases to using drm_sched_job_add_resv_dependencies().
That was done, but this TODO here forgotten.
>> dma_resv_for_each_fence(&cursor, resv,
>> DMA_RESV_USAGE_BOOKKEEP, f) {
>> dma_fence_chain_for_each(f, f) {
>>
>> ..
>>
>> }
>> during PT update amdgpu_vm_bo_update() is using sync to moving
>> fences(Eviction fence) before mapping anything. Because of this Eviction fence will
>> act as dependency.
> Yes, since the amdgpu_sync_resv() uses the bookkeep usage, then all the BOs reservation fences along with the eviction fence will be returned and added to the sync.
Yeah, but that is incorrect.
> And with the attached patch, the eviction fence can be released properly when the kq and uq are enabled.
We need to fix the underlying bug first before we can work on the next step.
Regards,
Christian.
>
> Thanks,
> Prike
>
>> ~arvind
>>
>>> Regards,
>>> Christian.
>>>
>>>> , and then the eviction fence will be added as a dependent fence by
>> propagating with amdgpu_sync_push_to_job(). With removing the eviction fence
>> from the VM sync at amdgpu_sync_resv(), then the eviction fence can be released
>> properly.
>>>> Thanks,
>>>> Prike
>>>>> Thanks,
>>>>> Christian.
>>>>>
>>>>> PS: Please stop calling the eviction fence evf.
>>>>>
>>>>>>>> return 0;
>>>>>>>>
>>>>>>>> free_err:
>>>>>>>> @@ -237,6 +234,30 @@ void amdgpu_eviction_fence_detach(struct
>>>>>>> amdgpu_eviction_fence_mgr *evf_mgr,
>>>>>>>> dma_fence_put(stub);
>>>>>>>> }
>>>>>>>>
>>>>>>>> +void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo
>>>>>>>> +*bo)
>>>>>>> Please name that amdgpu_eviction_fence_remove_all().
>>>>>> Noted.
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>> +{
>>>>>>>> + struct dma_resv *resv = &bo->tbo.base._resv;
>>>>>>>> + struct dma_fence *fence, *stub;
>>>>>>>> + struct dma_resv_iter cursor;
>>>>>>>> +
>>>>>>>> + dma_resv_assert_held(resv);
>>>>>>>> +
>>>>>>>> + stub = dma_fence_get_stub();
>>>>>>>> + dma_resv_for_each_fence(&cursor, resv,
>>>>>>> DMA_RESV_USAGE_BOOKKEEP, fence) {
>>>>>>>> + struct amdgpu_eviction_fence *ev_fence;
>>>>>>>> +
>>>>>>>> + ev_fence = fence_to_evf(fence);
>>>>>>>> + if (!ev_fence || !dma_fence_is_signaled(&ev_fence->base))
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + dma_resv_replace_fences(resv, fence->context, stub,
>>>>>>>> + DMA_RESV_USAGE_BOOKKEEP);
>>>>>>>> +
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + dma_fence_put(stub);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> int amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr
>>>>>>>> *evf_mgr) {
>>>>>>>> /* This needs to be done one time per open */ diff --git
>>>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>>>>>>> index fcd867b7147d..da99ac322a2e 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>>>>>>> @@ -66,4 +66,5 @@ amdgpu_eviction_fence_signal(struct
>>>>>>>> amdgpu_eviction_fence_mgr *evf_mgr, int
>>>>>>>> amdgpu_eviction_fence_replace_fence(struct
>>>>>>>> amdgpu_eviction_fence_mgr
>>>>>>> *evf_mgr,
>>>>>>>> struct drm_exec *exec);
>>>>>>>> +void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo
>>>>>>>> +*bo);
>>>>>>>> #endif
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>>>> index 1e73ce30d4d7..f001018a01eb 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>>>> @@ -1392,6 +1392,7 @@ void amdgpu_bo_release_notify(struct
>>>>>>> ttm_buffer_object *bo)
>>>>>>>> amdgpu_vram_mgr_set_cleared(bo->resource);
>>>>>>>> dma_resv_add_fence(&bo->base._resv, fence,
>>>>>>> DMA_RESV_USAGE_KERNEL);
>>>>>>>> dma_fence_put(fence);
>>>>>>>> + amdgpu_userq_remove_all_eviction_fences(abo);
>>>>>>>>
>>>>>>>> out:
>>>>>>>> dma_resv_unlock(&bo->base._resv);
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20250422/b12b3052/attachment-0001.htm>
More information about the amd-gfx
mailing list