[RFC] dma-buf: Import/export the implicit fences on the dma-buf

Tue Jul 12 16:16:28 UTC 2016

On Tue, Jul 12, 2016 at 05:36:37PM +0200, Christian König wrote:
> Am 12.07.2016 um 16:31 schrieb Daniel Vetter:
> > On Tue, Jul 12, 2016 at 01:14:41PM +0100, Chris Wilson wrote:
> > > On Tue, Jul 12, 2016 at 01:53:56PM +0200, Christian König wrote:
> > > > Am 11.07.2016 um 23:59 schrieb Chris Wilson:
> > > > > When dealing with user interfaces that utilize explicit fences, it is
> > > > > convenient to sometimes create those fences after the fact, i.e. to
> > > > > query the dma-buf for the current set of implicit fences, encapsulate
> > > > > those into a sync_file and hand that fd back to userspace.
> > > > > Correspondingly, being able to add an explicit fence back into the mix
> > > > > of fences being tracked by the dma-buf allows that userspace fence to be
> > > > > included in any implicit tracking.
> > > > Well I think that this is a rather questionable interface.
> > > > 
> > > > For example how do you deal with race conditions? E.g. between
> > > > querying the fences from the reservation object and adding a new one
> > > > we could gain new fences because of the kernel swapping things out
> > > > or another thread making some submission with this buffer.
> > > > 
> > > > Additional to that IIRC reservation_object_add_excl_fence()
> > > > currently replaces all shared fences with the exclusive one. A
> > > > malicious application could use this to trick the kernel driver into
> > > > thinking that this buffer object is idle while it is still accessed
> > > > by some operation. At least with GPU operations you can easily take
> > > > over the system when you manage to get access to a page table with
> > > > this.
> > > The only difference here is that we believe the GPU drivers to enforce
> > > the ordering between each other. So we can either insert a wait before
> > > adding the exclusive fence, or we can just not expose an import ioctl.
> > > Extracting the set of fences isn't an issue? (That's the part that has a
> > > more legitimate usecase.)
> > If we change the kernel to just merge everything together when importing a
> > new fence I think it should be perfectly save. I.e.
> > 1) grab reservation lock
> > 2) assemble a fence_array with the current fences + the new one passed in
> > through sync_file.
> > 3) put that into the right slot
> > 4) unlock
> 
> Yeah that sounds like this should work.
> 
> > If we switch reservations over to fence_array it might even be somewhat
> > pretty.
> 
> Actually I rather wanted to suggest that we use something like the
> amdgpu_sync object for the reservation object instead.
> 
> E.g. a collection like interface where fences can be added later on and only
> the newest one is kept around for each context.

Yeah, imo fence_array should automatically reduce the set of fences to the
minimal set needed and merge all the ones on the same timeline. And we
could rebase reservations on top of fence-array.

Or at least something along those lines.

> The only problem with that approach is that it is a bit tricky to do without
> locking, e.g. only RCU.

I thought RCU is only for reading/waiting on fences, and that for changing
them you always need to acquire the ww mutex behind the reservation?

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch