[Intel-gfx] [RFC v2] drm/i915: Android native sync support

Sat Jan 24 08:08:32 PST 2015

On Sat, Jan 24, 2015 at 10:41:46AM +0100, Daniel Vetter wrote:
> On Fri, Jan 23, 2015 at 6:30 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > On Fri, Jan 23, 2015 at 04:53:48PM +0100, Daniel Vetter wrote:
> >> Yeah that's kind the big behaviour difference (at least as I see it)
> >> between explicit sync and implicit sync:
> >> - with implicit sync the kernel attachs sync points/requests to buffers
> >>   and userspace just asks about idle/business of buffers. Synchronization
> >>   between different users is all handled behind userspace's back in the
> >>   kernel.
> >>
> >> - explicit sync attaches sync points to individual bits of work and makes
> >>   them explicit objects userspace can get at and pass around. Userspace
> >>   uses these separate things to inquire about when something is
> >>   done/idle/busy and has its own mapping between explicit sync objects and
> >>   the different pieces of memory affected by each. Synchronization between
> >>   different clients is handled explicitly by passing sync objects around
> >>   each time some rendering is done.
> >>
> >> The bigger driver for explicit sync (besides "nvidia likes it sooooo much
> >> that everyone uses it a lot") seems to be a) shitty gpu drivers without
> >> proper bo managers (*cough*android*cough*) and svm, where there's simply
> >> no buffer objects any more to attach sync information to.
> >
> > Actually, mesa would really like much finer granularity than at batch
> > boundaries. Having a sync object for a batch boundary itself is very meh
> > and not a substantive improvement on what it possible today, but being
> > able to convert the implicit sync into an explicit fence object is
> > interesting and lends a layer of abstraction that could make it more
> > versatile. Most importantly, it allows me to defer the overhead of fence
> > creation until I actually want to sleep on a completion. Also Jesse
> > originally supporting inserting fences inside a batch, which looked
> > interesting if impractical.
> 
> If want to allow the kernel to stall on fences (in e.g. the scheduler)
> only the kernel should be allowed to create fences imo. At least
> current fences assume that they _will_ signal eventually, and for i915
> fences we have the hangcheck to ensure this is the case. In-batch
> fences and lazy fence creation (beyond just delaying the fd allocation
> to avoid too many fds flying around) is therefore a no-go.

Lazy fence creation (i.e. attaching a fence to a buffer) just means
creating the fd for an existing request (which is derived from the
fence). Or if the buffer is read or write idle, then you just create the
fence as already-signaled. And yes, it is just to avoid the death by a
thousand file descriptors and especially creating one every batch.

> For that kind of fine-grained sync between gpu and cpu workloads the
> solutions thus far (at least what I've seen) is just busy-looping.
> Usually those workloads have a few order more sync pionts than frames
> we tend to render, so blocking isn't terrible efficient anyway.

Heck, I think a full-fledged fence fd per batch is still more overhead
than I want.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre