[PATCH v3 4/5] drm/amdgpu: validate the eviction fence before attaching/detaching
Christian König
christian.koenig at amd.com
Thu May 8 09:40:01 UTC 2025
On 5/8/25 09:08, Liang, Prike wrote:
> [Public]
>
>> From: Koenig, Christian <Christian.Koenig at amd.com>
>> Sent: Tuesday, May 6, 2025 4:39 PM
>> To: Liang, Prike <Prike.Liang at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
>> Subject: Re: [PATCH v3 4/5] drm/amdgpu: validate the eviction fence before
>> attaching/detaching
>>
>> On 5/6/25 10:22, Liang, Prike wrote:
>>>>> - /* attach gfx eviction fence */
>>>>> + /* attach gfx the validated eviction fence */
>>>>> r = amdgpu_eviction_fence_attach(&fpriv->evf_mgr, abo);
>>>>> if (r) {
>>>>> DRM_DEBUG_DRIVER("Failed to attach eviction fence to BO\n");
>>>>> + amdgpu_bo_unreserve(abo);
>>>> Adding this here looks like the only valid fix in the patch.
>>> As the eviction fence will be invalidated until the user queue is created from the
>> user space, here it requires validating the eviction fence before trying to attach
>> and detach it to the reservation.
>>> I will try to draft a patch for validating the eviction fence at attach/detach
>> separately with this attach error handler change.
>>
>>
>> No, that is clearly incorrect.
>>
>> See the eviction fence works like this:
>>
>> Validating thread
>> * Create new eviction fence
>> * Publish eviction fence
>> * Lock all BOs
>> * Replace eviction fence
>>
>> Attaching:
>> * Lock BO
>> * Attach current eviction fence
>> * Unlock BO
>>
>> Detaching:
>> * Lock BO
>> * Unconditionally detach all possible eviction fences, no matter if new or old.
>> * Unlock BO
>>
>> This order is necessary or otherwise you break the logic here.
>>
>> Any additional check will completely mess that up because it makes the operation
>> racy.
> As the user queue eviction fence doesn't create until user queue submission, the eviction fence will be NULL without userq submission. So do we still try to attach/detach the null eviction fence for the kernel queue case?
Yes, the problem is that we can't check the eviction fence before we have taken the reservation lock.
Otherwise it can always be that there is an eviction fence created between the check and attaching it.
I also suggested before that the eviction fence is never NULL, we just start with a dummy stub fence (see function dma_fence_get_stub()). This way we can avoid all the NULL checks.
> It's ok without validating the eviction fence or userqueue work before attach/detach the eviction fence, but it will cost cycles for walking over the reservation fences array in the dma_resv_reserve_fences() and dma_resv_replace_fences().
That's completely irrelevant. Important is that we have the right sequence to not create a race condition.
Regards,
Christian.
>
>> Regards,
>> Christian.
>>
>>>
>>> Thanks,
>>> Prike
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> return r;
>>>>> }
>>>>>
>
More information about the amd-gfx
mailing list