[Intel-gfx] [PATCH v3 13/14] drm/i915: Enable userspace to opt-out of implicit fencing

Thu Jan 26 10:32:40 UTC 2017

On Wed, Jan 25, 2017 at 12:38:32PM -0800, Chad Versace wrote:
> On Mon 14 Nov 2016, Chris Wilson wrote:
> > Userspace is faced with a dilemma. The kernel requires implicit fencing
> > to manage resource usage (we always must wait for the GPU to finish
> > before releasing its PTE) and for third parties. However, userspace may
> > wish to avoid this serialisation if it is either using explicit fencing
> > between parties and wants more fine-grained access to buffers (e.g. it
> > may partition the buffer between uses and track fences on ranges rather
> > than the implicit fences tracking the whole object). It follows that
> > userspace needs a mechanism to avoid the kernel's serialisation on its
> > implicit fences before execbuf execution.
> > 
> > The next question is whether this is an object, execbuf or context flag.
> > Hybrid users (such as using explicit EGL_ANDROID_native_sync fencing on
> > shared winsys buffers, but implicit fencing on internal surfaces)
> > require a per-object level flag. Given that this flag need to be only
> > set once for the lifetime of the object, this reduces the convenience of
> > having an execbuf or context level flag (and avoids having multiple
> > pieces of uABI controlling the same feature).
> > 
> > Incorrect use of this flag will result in rendering corruption and GPU
> > hangs - but will not result in use-after-free or similar resource
> > tracking issues.
> > 
> > Serious caveat: write ordering is not strictly correct after setting
> > this flag on a render target on multiple engines. This affects all
> > subsequent GEM operations (execbuf, set-domain, pread) and shared
> > dma-buf operations. A fix is possible - but costly (both in terms of
> > further ABI changes and runtime overhead).
> > 
> > Testcase: igt/gem_exec_async
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.c            |  1 +
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +++
> >  include/uapi/drm/i915_drm.h                | 29 ++++++++++++++++++++++++++++-
> >  3 files changed, 32 insertions(+), 1 deletion(-)
> 
> I'm neutral about this patch. I believe patch 14/14 is useful with or
> without this patch, and I want to see patch 14 land regardless of what
> happens with this one.

I don't like the patch, it opens up a big wart in the GEM api (incorrect
write tracking on GEM/dma-buf across multiple timelines). Otoh, being
able to discard the implicit fence tracking seems to be an important
feature request - if we go forward witout it, we will then be lacking a
feature that is common across other drivers and in particular seems to
be commonplace in the Android ecosystem.

Daniel, what's your feeling? One problem will be that the
synchronisation issue may be hard to track down in future (proving that
the cause of a stall is an avoidable implicit fence).

> I'm not opposed to this patch. It's just that I don't yet understand
> exactly if Mesa's EGL/GL code could effectively use this feature for
> Android winsys buffers. The amount of information loss between the
> EGL/GL apis and the eventual execbuffer submission may prevent Mesa from
> annotating the Android winsys buffers with this.  I'm unsure.  I'm still
> thinking about it.
> 
> But, if Chris, or anyone, already has plans to use this somehow, perhaps
> in the DDX, then don't let my hesitation block the patch.

Actually, the example I have would be for mesa. It can use this on its
own scratch buffers to just discard writes and prevent ordering on
a single scratch shared between contexts, and for its fence tracking using
a single page for multiple rings.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre