[PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.

Tue Sep 13 09:39:45 UTC 2016

On Tue, Sep 13, 2016 at 10:44:11AM +0200, Christian König wrote:
> Am 09.09.2016 um 03:15 schrieb Michel Dänzer:
> >On 09/09/16 01:23 AM, Chris Wilson wrote:
> >>On Thu, Sep 08, 2016 at 05:21:42PM +0200, Mario Kleiner wrote:
> >>>On 09/08/2016 08:30 AM, Chris Wilson wrote:
> >>>>On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote:
> >>>>>amdgpu-kms uses shared fences for its prime exported dmabufs,
> >>>>>instead of an exclusive fence. Therefore we need to wait for
> >>>>>all fences of the dmabuf reservation object to prevent
> >>>>>unsynchronized rendering and flipping.
> >>>>No. Fix the root cause as this affects not just flips but copies -
> >>>>this implies that everybody using the resv object must wait for all
> >>>>fences. The resv object is not just used for prime, but all fencing, so
> >>>>this breaks the ability to schedule parallel operations across engine.
> >>>>-Chris
> >>>>
> >>>Ok. I think i now understand the difference, but let's check: The
> >>>exclusive fence is essentially acting a bit like a write-lock, and
> >>>the shared fences as readers-locks? So you can have multiple readers
> >>>but only one writer at a time?
> >>That's how we (i915.ko and I hope the rest of the world) are using them.
> >>In the model where here is just one reservation object on the GEM
> >>object, that reservation object is then shared between internal driver
> >>scheduling and external. We are reliant on being able to use buffers on
> >>multiple engines through the virtue of the shared fences, and to serialise
> >>after writes by waiting on the exclusive fence. (So we can have concurrent
> >>reads on the display engine, render engines and on the CPU - or
> >>alternatively an exclusive writer.)
> >>
> >>In the near future, i915 flips will wait on the common reservation object
> >>not only for dma-bufs, but also its own GEM objects.
> >>>Ie.:
> >>>
> >>>Writer must wait for all fences before starting write access to a
> >>>buffer, then attach the exclusive fence and signal it on end of
> >>>write access. E.g., write to renderbuffer, write to texture etc.
> >>Yes.
> >>>Readers must wait for exclusive fence, then attach a shared fence
> >>>per reader and signal it on end of read access? E.g., read from
> >>>texture, fb, scanout?
> >>Yes.
> >>>Is that correct? In that case we'd have a missing exclusive fence in
> >>>amdgpu for the linear target dmabuf? Probably beyond my level of
> >>>knowledge to fix this?
> >>i915.ko requires the client to mark which buffers are written to.
> >>
> >>In ttm, there are ttm_validate_buffer objects which mark whether they
> >>should be using shared or exclusive fences. Afaict, in amdgpu they are
> >>all set to shared, the relevant user interface seems to be
> >>amdgpu_bo_list_set().
> >This all makes sense to me.
> >
> >Christian, why is amdgpu setting only shared fences? Can we fix that?
> 
> No, amdgpu relies on the fact that we even allow concurrent write
> accesses by userspace.
> 
> E.g. one part of the buffer can be rendered by one engine while
> another part could be rendered by another engine.
> 
> Just imagine X which is composing a buffer with both the 3D engine
> as well as the DMA engine.
> 
> All engines need to run in parallel and you need to wait for all of
> them to finish before scanout.
> 
> Everybody which needs exclusive access to the reservation object
> (like scanouts do) needs to wait for all fences, not just the
> exclusive one.
> 
> The Intel driver clearly needs to be fixed here.

If you are not using implicit fencing, you have to pass explicit fences
instead.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre