[PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.

Fri Sep 23 12:09:54 UTC 2016

On Fri, Sep 23, 2016 at 07:00:25PM +0900, Michel Dänzer wrote:
> On 22/09/16 10:22 PM, Christian König wrote:
> > Am 22.09.2016 um 15:05 schrieb Daniel Vetter:
> >> On Thu, Sep 22, 2016 at 2:44 PM, Christian König
> >> <deathsimple at vodafone.de> wrote:
> >>>> - explicit fencing: Userspace passes around distinct fence objects for
> >>>> any work going on on the gpu. The kernel doesn't insert any stall of
> >>>> it's own (except for moving buffer objects around ofc). This is what
> >>>> Android. This also seems to be what amdgpu is doing within one
> >>>> process/owner.
> >>>
> >>> No, that is clearly not my understanding of explicit fencing.
> >>>
> >>> Userspace doesn't necessarily need to pass around distinct fence objects
> >>> with all of it's protocols and the kernel is still responsible for
> >>> inserting
> >>> stalls whenever an userspace protocol or application requires this
> >>> semantics.
> >>>
> >>> Otherwise you will never be able to use explicit fencing on the Linux
> >>> desktop with protocols like DRI2/DRI3.
> >> This is about mixing them. Explicit fencing still means userspace has
> >> an explicit piece, separate from buffers, (either sync_file fd, or a
> >> driver-specific cookie, or similar).
> >>
> >>> I would expect that every driver in the system waits for all fences of a
> >>> reservation object as long as it isn't told otherwise by providing a
> >>> distinct fence object with the IOCTL in question.
> >> Yup agreed. This way if your explicitly-fencing driver reads a shared
> >> buffer passed over a protocol that does implicit fencing (like
> >> DRI2/3), then it will work.
> >>
> >> The other interop direction is explicitly-fencing driver passes a
> >> buffer to a consumer which expects implicit fencing. In that case you
> >> must attach the right fence to the exclusive slot, but _only_ in that
> >> case.
> > 
> > Ok well sounds like you are close to understand why I can't do exactly
> > this: There simply is no right fence I could attach.
> > 
> > When amdgpu makes the command submissions it doesn't necessarily know
> > that the buffer will be exported and shared with another device later on.
> > 
> > So when the buffer is exported and given to the other device you might
> > have a whole bunch of fences which run concurrently and not in any
> > serial order.
> 
> I feel like you're thinking too much of buffers shared between GPUs as
> being short-lived and only shared late. In the use-cases I know about,
> shared buffers are created separately and shared ahead of time, the
> actual rendering work is done to non-shared buffers and then just copied
> to the shared buffers for transfer between GPUs. These copies are always
> performed by the same context in such a way that they should always be
> performed by the same HW engine and thus implicitly serialized.
> 
> Do you have any specific use-cases in mind where buffers are only shared
> between GPUs after the rendering operations creating the buffer contents
> to be shared have already been submitted?

Yeah, it should be known which buffer you use (at least in userspace,
maybe not in the kernel) for which you need implicit fencing. At least
with DRI2/3 it's really obvious which buffers are shared. Same holds for
external images and other imported buffers.

Yes that means you need to keep track of a few things in userspace, and
you need to add a special flag to CS to make sure the kernel does set the
exclusive fence.

> >> Otherwise you end up stalling your explicitly-fencing userspace,
> >> since implicit fencing doesn't allow more than 1 writer. For amdgpu
> >> one possible way to implement this might be to count how many users a
> >> dma-buf has, and if it's more than just the current context set the
> >> exclusive fence. Or do an uabi revision and let userspace decide (or
> >> at least overwrite it).
> > 
> > I mean I can pick one fence and wait for the rest to finish manually,
> > but that would certainly defeat the whole effort, doesn't it?
> 
> I'm afraid it's not clear to me why it would. Can you elaborate?
> 
> 
> >> But the current approach in amdgpu_sync.c of declaring a fence as
> >> exclusive after the fact (if owners don't match) just isn't how
> >> reservation_object works. You can of course change that, but that
> >> means you must change all drivers implementing support for implicit
> >> fencing of dma-buf. Fixing amdgpu will be easier ;-)
> > 
> > Well as far as I can see there is no way I can fix amdgpu in this case.
> > 
> > The handling clearly needs to be changed on the receiving side of the
> > reservation objects if I don't completely want to disable concurrent
> > access to BOs in amdgpu.
> 
> Anyway, we need a solution for this between radeon and amdgpu, and I
> don't think a solution which involves those drivers using reservation
> object semantics between them which are different from all other drivers
> is a good idea.

Afaik there's also amd+intel machines out there, so really the only option
is to either fix amdgpu to correctly set exclusive fences on shared
buffers (with the help of userspace hints). Or change all the existing
drivers. No idea what's simpler to do, since I don't know about amdgpu
userspace.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch