[PATCH] dma-buf/sw_sync: Add a reference when adding fence to timeline list

Thadeu Lima de Souza Cascardo cascardo at igalia.com
Thu Jun 20 21:28:42 UTC 2024


On Thu, Jun 20, 2024 at 03:50:58PM -0300, Thadeu Lima de Souza Cascardo wrote:
> On Wed, Jun 19, 2024 at 01:14:38PM +0100, Tvrtko Ursulin wrote:
> > 
> > On 14/06/2024 19:00, Thadeu Lima de Souza Cascardo wrote:
> > > On Fri, Jun 14, 2024 at 11:52:03AM +0100, Tvrtko Ursulin wrote:
> > > > 
> > > > On 24/03/2024 10:15, Thadeu Lima de Souza Cascardo wrote:
> > > > > One is that it takes care of the error case when sync_file_create fails.
> > > > > 
> > > > > The extra reference is put, while the fence is still held on the list, so
> > > > > its last reference will be put when it is removed from the list either in
> > > > > sync_timeline_signal or sw_sync_debugfs_release.
> > > > 
> > > > So any fences where sync_file_create failed linger around until
> > > > sw_sync_debugfs_release? Okay-ish I guess since it is a pathological case.
> > > > 
> > > 
> > > The challenge here is to determine which one of the multiple cases we are
> > > dealing with. Since we don't hold the lock while sync_file_create is
> > > called, we are left with this situation. An alternative would be to fold
> > > sync_pt_create into sw_sync_ioctl_create_fence, so at least we can
> > > determine which case is which. That would also fix the case where we handle
> > > userspace a file descriptor with a fence that is not even on the list.
> > 
> > Since sync_pt_create is local and has only this single caller it could be
> > worth exploring this option to see if it could simplify things and get rid
> > of this lingering objects corner case.
> > 
> 
> So when I went back looking into this, I actually needed to make
> sw_sync_ioctl_create_fence be able to allocate the sync_file without
> assigning a fence. That's when I realized it wouldn't buy us much as we
> could check for the signaled case and return NULL. Let me look at this
> again and get back to you.

Okay, so trying to close this subthread down:

we also might fail when writing down the filedescriptor to userspace (the
copy_to_user after sync_file_create).

Neither of the following options appeal to me:

1) try to write to userspace before actually creating the fence, that breaks
ABI. Despite the ioctl returning an error, the fd has been written to, even
though it has not been installed.

2) try to rollback the insertion of the fence into the list. Imagine two
threads:

one tries to create fence with value A but has data RO, so fence will be
created and inserted into the list, but when copy_to_user fails, it will try to
rollback.

second thread also tries to create fence with value A, but now data is legit.
We might race with the rollback and that won't go well.

Also, as I mentioned in one of the other responses, leaving the fence on
the list is not different from userspace creating the fence then closing
the file descriptor. That will also leave the fence on the list until it is
signaled.

So perhaps this is the crux of the design here. Before this patch, putting
the last reference would remove it from the list. But keeping with this
design leads to all sorts of issues that I mentioned. The change that I am
proposing is to keep fences on the timeline until either they are signaled
or the timeline is destroyed.

Thoughts?
Cascardo.


More information about the dri-devel mailing list