[PATCH] drm/syncobj: Deal with signalled fences in transfer.

Lionel Landwerlin lionel.g.landwerlin at intel.com
Tue Dec 7 11:28:40 UTC 2021


On 07/12/2021 13:00, Christian König wrote:
> Am 07.12.21 um 11:40 schrieb Bas Nieuwenhuizen:
>> On Tue, Dec 7, 2021 at 8:21 AM Christian König 
>> <christian.koenig at amd.com> wrote:
>>> Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:
>>>> On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
>>>>> See the comments in the code. Basically if the seqno is already
>>>>> signalled then we get a NULL fence. If we then put the NULL fence
>>>>> in a binary syncobj it counts as unsignalled, making that syncobj
>>>>> pretty much useless for all expected uses.
>>>>>
>>>>> Not 100% sure about the transfer to a timeline syncobj but I
>>>>> believe it is needed there too, as AFAICT the add_point function
>>>>> assumes the fence isn't NULL.
>>>>>
>>>>> Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between
>>>>> binary and timeline v2")
>>>>> Cc: stable at vger.kernel.org
>>>>> Signed-off-by: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>>>>> ---
>>>>>    drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++
>>>>>    1 file changed, 26 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c
>>>>> b/drivers/gpu/drm/drm_syncobj.c
>>>>> index fdd2ec87cdd1..eb28a40400d2 100644
>>>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>>>> @@ -861,6 +861,19 @@ static int
>>>>> drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
>>>>>                         &fence);
>>>>>        if (ret)
>>>>>            goto err;
>>>>> +
>>>>> +    /* If the requested seqno is already signaled
>>>>> drm_syncobj_find_fence may
>>>>> +     * return a NULL fence. To make sure the recipient gets
>>>>> signalled, use
>>>>> +     * a new fence instead.
>>>>> +     */
>>>>> +    if (!fence) {
>>>>> +        fence = dma_fence_allocate_private_stub();
>>>>> +        if (!fence) {
>>>>> +            ret = -ENOMEM;
>>>>> +            goto err;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>
>>>> Shouldn't we fix drm_syncobj_find_fence() instead?
>>> Mhm, now that you mention it. Bas, why do you think that
>>> dma_fence_chain_find_seqno() may return NULL when the fence is already
>>> signaled?
>>>
>>> Double checking the code that should never ever happen.
>> Well, I tested the patch with
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fmerge_requests%2F14097%2Fdiffs%3Fcommit_id%3Dd4c5c840f4e3839f9f5c1747a9034eb2b565f5c0&data=04%7C01%7Cchristian.koenig%40amd.com%7Cc1ab29fc100842826f5d08d9b96e102a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637744705383763833%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=sXkTJWm%2FWm2xwgLGdepVWAOlqj%2FeArnvmMvnJpQ9YEs%3D&reserved=0 
>>
>> so I'm pretty sure it happens, and this patch fixes  it, though I may
>> have misidentified what the code should do.
>>
>> My reading is that the dma_fence_chain_for_each in
>> dma_fence_chain_find_seqno will never visit a signalled fence (unless
>> the top one is signalled), as dma_fence_chain_walk will never return a
>> signalled fence (it only returns on NULL or !signalled).
>
> Ah, yes that suddenly makes more sense.
>
>> Happy to move this to drm_syncobj_find_fence.
>
> No, I think that your current patch is fine.
>
> That drm_syncobj_find_fence() only returns NULL when it can't find 
> anything !signaled is correct behavior I think.


We should probably update the docs then :


  * Returns 0 on success or a negative error value on failure. On 
success @fence
  * contains a reference to the fence, which must be released by calling
  * dma_fence_put().


Looking at some of the kernel drivers, it looks like they don't all 
protect themselves against NULL pointers :


https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/vc4/vc4_gem.c#L1195

https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c#L1020


-Lionel


>
> Going to push your original patch if nobody has any more objections.
>
> But somebody might want to take care of the IGT as well.
>
> Regards,
> Christian.
>
>>> Regards,
>>> Christian.
>>>
>>>> By returning a stub fence for the timeline case if there isn't one.
>>>>
>>>>
>>>> Because the same NULL fence check appears missing in amdgpu (and
>>>> probably other drivers).
>>>>
>>>>
>>>> Also we should have tests for this in IGT.
>>>>
>>>> AMD contributed some tests when this code was written but they never
>>>> got reviewed :(
>>>>
>>>>
>>>> -Lionel
>>>>
>>>>
>>>>>        chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>>>>>        if (!chain) {
>>>>>            ret = -ENOMEM;
>>>>> @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file
>>>>> *file_private,
>>>>>                         args->src_point, args->flags, &fence);
>>>>>        if (ret)
>>>>>            goto err;
>>>>> +
>>>>> +    /* If the requested seqno is already signaled
>>>>> drm_syncobj_find_fence may
>>>>> +     * return a NULL fence. To make sure the recipient gets
>>>>> signalled, use
>>>>> +     * a new fence instead.
>>>>> +     */
>>>>> +    if (!fence) {
>>>>> +        fence = dma_fence_allocate_private_stub();
>>>>> +        if (!fence) {
>>>>> +            ret = -ENOMEM;
>>>>> +            goto err;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>>        drm_syncobj_replace_fence(binary_syncobj, fence);
>>>>>        dma_fence_put(fence);
>>>>>    err:
>>>>
>



More information about the dri-devel mailing list