[Intel-gfx] [PATCH v3 13/14] drm/i915: Enable userspace to opt-out of implicit fencing

Fri Jan 27 00:07:12 UTC 2017

On Thu 26 Jan 2017, Chris Wilson wrote:
> On Wed, Jan 25, 2017 at 12:38:32PM -0800, Chad Versace wrote:
> > On Mon 14 Nov 2016, Chris Wilson wrote:
> > > Userspace is faced with a dilemma. The kernel requires implicit fencing
> > > to manage resource usage (we always must wait for the GPU to finish
> > > before releasing its PTE) and for third parties. However, userspace may
> > > wish to avoid this serialisation if it is either using explicit fencing
> > > between parties and wants more fine-grained access to buffers (e.g. it
> > > may partition the buffer between uses and track fences on ranges rather
> > > than the implicit fences tracking the whole object). It follows that
> > > userspace needs a mechanism to avoid the kernel's serialisation on its
> > > implicit fences before execbuf execution.
> > > 
> > > The next question is whether this is an object, execbuf or context flag.
> > > Hybrid users (such as using explicit EGL_ANDROID_native_sync fencing on
> > > shared winsys buffers, but implicit fencing on internal surfaces)
> > > require a per-object level flag. Given that this flag need to be only
> > > set once for the lifetime of the object, this reduces the convenience of
> > > having an execbuf or context level flag (and avoids having multiple
> > > pieces of uABI controlling the same feature).
> > > 
> > > Incorrect use of this flag will result in rendering corruption and GPU
> > > hangs - but will not result in use-after-free or similar resource
> > > tracking issues.
> > > 
> > > Serious caveat: write ordering is not strictly correct after setting
> > > this flag on a render target on multiple engines. This affects all
> > > subsequent GEM operations (execbuf, set-domain, pread) and shared
> > > dma-buf operations. A fix is possible - but costly (both in terms of
> > > further ABI changes and runtime overhead).
> > > 
> > > Testcase: igt/gem_exec_async
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_drv.c            |  1 +
> > >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +++
> > >  include/uapi/drm/i915_drm.h                | 29 ++++++++++++++++++++++++++++-
> > >  3 files changed, 32 insertions(+), 1 deletion(-)
> > 
> > I'm neutral about this patch. I believe patch 14/14 is useful with or
> > without this patch, and I want to see patch 14 land regardless of what
> > happens with this one.
> 
> I don't like the patch, it opens up a big wart in the GEM api (incorrect
> write tracking on GEM/dma-buf across multiple timelines). Otoh, being
> able to discard the implicit fence tracking seems to be an important
> feature request - if we go forward witout it, we will then be lacking a
> feature that is common across other drivers and in particular seems to
> be commonplace in the Android ecosystem.

I agree. The explicit fence fds provide more benefit (that is, less
blocking and, in general, more *explicitness*) when implicit fencing is
disabled. Userspace should have some API to disable the implicit
fencing, and this patch seems like an ok approach. I certainly can think
of nothing better.

> Daniel, what's your feeling? One problem will be that the
> synchronisation issue may be hard to track down in future (proving that
> the cause of a stall is an avoidable implicit fence).
>  
> > I'm not opposed to this patch. It's just that I don't yet understand
> > exactly if Mesa's EGL/GL code could effectively use this feature for
> > Android winsys buffers. The amount of information loss between the
> > EGL/GL apis and the eventual execbuffer submission may prevent Mesa from
> > annotating the Android winsys buffers with this.  I'm unsure.  I'm still
> > thinking about it.
> > 
> > But, if Chris, or anyone, already has plans to use this somehow, perhaps
> > in the DDX, then don't let my hesitation block the patch.
> 
> Actually, the example I have would be for mesa. It can use this on its
> own scratch buffers to just discard writes and prevent ordering on
> a single scratch shared between contexts, and for its fence tracking using
> a single page for multiple rings.

Those use cases sound good to me. This patch is
Acked-by: Chad Versace <chadversary at chromium.org>