[PATCH] drm/syncobj: Deal with signalled fences in transfer.

Christian König christian.koenig at amd.com
Tue Dec 7 11:51:16 UTC 2021


Am 07.12.21 um 12:35 schrieb Bas Nieuwenhuizen:
> On Tue, Dec 7, 2021 at 12:28 PM Lionel Landwerlin
> <lionel.g.landwerlin at intel.com> wrote:
>> On 07/12/2021 13:00, Christian König wrote:
>>> Am 07.12.21 um 11:40 schrieb Bas Nieuwenhuizen:
>>>> On Tue, Dec 7, 2021 at 8:21 AM Christian König
>>>> <christian.koenig at amd.com> wrote:
>>>>> Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:
>>>>>> On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
>>>>>>> See the comments in the code. Basically if the seqno is already
>>>>>>> signalled then we get a NULL fence. If we then put the NULL fence
>>>>>>> in a binary syncobj it counts as unsignalled, making that syncobj
>>>>>>> pretty much useless for all expected uses.
>>>>>>>
>>>>>>> Not 100% sure about the transfer to a timeline syncobj but I
>>>>>>> believe it is needed there too, as AFAICT the add_point function
>>>>>>> assumes the fence isn't NULL.
>>>>>>>
>>>>>>> Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between
>>>>>>> binary and timeline v2")
>>>>>>> Cc: stable at vger.kernel.org
>>>>>>> Signed-off-by: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++
>>>>>>>     1 file changed, 26 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c
>>>>>>> b/drivers/gpu/drm/drm_syncobj.c
>>>>>>> index fdd2ec87cdd1..eb28a40400d2 100644
>>>>>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>>>>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>>>>>> @@ -861,6 +861,19 @@ static int
>>>>>>> drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
>>>>>>>                          &fence);
>>>>>>>         if (ret)
>>>>>>>             goto err;
>>>>>>> +
>>>>>>> +    /* If the requested seqno is already signaled
>>>>>>> drm_syncobj_find_fence may
>>>>>>> +     * return a NULL fence. To make sure the recipient gets
>>>>>>> signalled, use
>>>>>>> +     * a new fence instead.
>>>>>>> +     */
>>>>>>> +    if (!fence) {
>>>>>>> +        fence = dma_fence_allocate_private_stub();
>>>>>>> +        if (!fence) {
>>>>>>> +            ret = -ENOMEM;
>>>>>>> +            goto err;
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>> Shouldn't we fix drm_syncobj_find_fence() instead?
>>>>> Mhm, now that you mention it. Bas, why do you think that
>>>>> dma_fence_chain_find_seqno() may return NULL when the fence is already
>>>>> signaled?
>>>>>
>>>>> Double checking the code that should never ever happen.
>>>> Well, I tested the patch with
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fmerge_requests%2F14097%2Fdiffs%3Fcommit_id%3Dd4c5c840f4e3839f9f5c1747a9034eb2b565f5c0&data=04%7C01%7Cchristian.koenig%40amd.com%7C342b0cbdff7d487630ae08d9b975abd9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637744738372115823%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=5SiZYL1TgLq3ldGy1COOkSasklZWQN%2BxWGXJ1j%2BHSOQ%3D&reserved=0
>>>>
>>>> so I'm pretty sure it happens, and this patch fixes  it, though I may
>>>> have misidentified what the code should do.
>>>>
>>>> My reading is that the dma_fence_chain_for_each in
>>>> dma_fence_chain_find_seqno will never visit a signalled fence (unless
>>>> the top one is signalled), as dma_fence_chain_walk will never return a
>>>> signalled fence (it only returns on NULL or !signalled).
>>> Ah, yes that suddenly makes more sense.
>>>
>>>> Happy to move this to drm_syncobj_find_fence.
>>> No, I think that your current patch is fine.
>>>
>>> That drm_syncobj_find_fence() only returns NULL when it can't find
>>> anything !signaled is correct behavior I think.
>>
>> We should probably update the docs then :
>>
>>
>>    * Returns 0 on success or a negative error value on failure. On
>> success @fence
>>    * contains a reference to the fence, which must be released by calling
>>    * dma_fence_put().
>>
>>
>> Looking at some of the kernel drivers, it looks like they don't all
>> protect themselves against NULL pointers :
>>
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Fvc4%2Fvc4_gem.c%23L1195&data=04%7C01%7Cchristian.koenig%40amd.com%7C342b0cbdff7d487630ae08d9b975abd9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637744738372115823%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=LjyWQpIUqqAGgR7ak3CJOJTqf%2FG8QB9BZX542vL25RA%3D&reserved=0
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Famd%2Famdgpu%2Famdgpu_cs.c%23L1020&data=04%7C01%7Cchristian.koenig%40amd.com%7C342b0cbdff7d487630ae08d9b975abd9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637744738372115823%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=kk9k8sDNLiOFTIyul79FbhfQ4Y2MdwFA6rT0h46xM40%3D&reserved=0
> amdgpu handles it here (amdgpu_sync_fence checks for a NULL fence).
> But yeah I think it is a bit treacherous, especially as this only
> occurs with timeline semaphores.

Mhm, that's a good point.

While I still think it makes more sense from the design perspective to 
return NULL to distinct that there is nothing to wait for, I see as well 
that this is it not the most defensive approach.

Ok in that case let's move it into drm_syncobj_find_fence() instead.

Just one more thing, the timestamp is now busted anyway (cause the 
original fence is already garbage collected). So we can probably use 
dma_fence_get_stub() instead of dma_fence_allocate_private_stub().

Regards,
Christian.

>
>>
>> -Lionel
>>
>>
>>> Going to push your original patch if nobody has any more objections.
>>>
>>> But somebody might want to take care of the IGT as well.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> By returning a stub fence for the timeline case if there isn't one.
>>>>>>
>>>>>>
>>>>>> Because the same NULL fence check appears missing in amdgpu (and
>>>>>> probably other drivers).
>>>>>>
>>>>>>
>>>>>> Also we should have tests for this in IGT.
>>>>>>
>>>>>> AMD contributed some tests when this code was written but they never
>>>>>> got reviewed :(
>>>>>>
>>>>>>
>>>>>> -Lionel
>>>>>>
>>>>>>
>>>>>>>         chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>>>>>>>         if (!chain) {
>>>>>>>             ret = -ENOMEM;
>>>>>>> @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file
>>>>>>> *file_private,
>>>>>>>                          args->src_point, args->flags, &fence);
>>>>>>>         if (ret)
>>>>>>>             goto err;
>>>>>>> +
>>>>>>> +    /* If the requested seqno is already signaled
>>>>>>> drm_syncobj_find_fence may
>>>>>>> +     * return a NULL fence. To make sure the recipient gets
>>>>>>> signalled, use
>>>>>>> +     * a new fence instead.
>>>>>>> +     */
>>>>>>> +    if (!fence) {
>>>>>>> +        fence = dma_fence_allocate_private_stub();
>>>>>>> +        if (!fence) {
>>>>>>> +            ret = -ENOMEM;
>>>>>>> +            goto err;
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>>         drm_syncobj_replace_fence(binary_syncobj, fence);
>>>>>>>         dma_fence_put(fence);
>>>>>>>     err:



More information about the dri-devel mailing list