[Intel-gfx] [PATCH] drm/i915: Embrace the race in busy-ioctl

Mon Aug 15 13:06:23 UTC 2016

Chris Wilson <chris at chris-wilson.co.uk> writes:

> Daniel Vetter proposed a new challenge to the serialisation inside the
> busy-ioctl that exposed a flaw that could result in us reporting the
> wrong engine as being busy. If the request is reallocated as we test
> its busyness and then reassigned to this object by another thread, we
> would not notice that the test itself was incorrect.
>
> We are faced with a choice of using __i915_gem_active_get_request_rcu()
> to first acquire a reference to the request preventing the race, or to
> acknowledge the race and accept the limitations upon the accuracy of the
> busy flags. Note that we guarantee that we never falsely report the
> object as idle (providing userspace itself doesn't race), and so the
> most important use of the busy-ioctl and its guarantees are fulfilled.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter at intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 87 ++++++++++++++++++++++-------------------
>  include/uapi/drm/i915_drm.h     | 15 ++++++-
>  2 files changed, 60 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5566916870eb..c77915378768 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3791,49 +3791,54 @@ static __always_inline unsigned int
>  __busy_set_if_active(const struct i915_gem_active *active,
>  		     unsigned int (*flag)(unsigned int id))
>  {
> -	/* For more discussion about the barriers and locking concerns,
> -	 * see __i915_gem_active_get_rcu().
> -	 */
> -	do {
> -		struct drm_i915_gem_request *request;
> -		unsigned int id;
> -
> -		request = rcu_dereference(active->request);
> -		if (!request || i915_gem_request_completed(request))
> -			return 0;
> +	struct drm_i915_gem_request *request;
>  
> -		id = request->engine->exec_id;
> +	request = rcu_dereference(active->request);
> +	if (!request || i915_gem_request_completed(request))
> +		return 0;
>  
> -		/* Check that the pointer wasn't reassigned and overwritten.
> -		 *
> -		 * In __i915_gem_active_get_rcu(), we enforce ordering between
> -		 * the first rcu pointer dereference (imposing a
> -		 * read-dependency only on access through the pointer) and
> -		 * the second lockless access through the memory barrier
> -		 * following a successful atomic_inc_not_zero(). Here there
> -		 * is no such barrier, and so we must manually insert an
> -		 * explicit read barrier to ensure that the following
> -		 * access occurs after all the loads through the first
> -		 * pointer.
> -		 *
> -		 * It is worth comparing this sequence with
> -		 * raw_write_seqcount_latch() which operates very similarly.
> -		 * The challenge here is the visibility of the other CPU
> -		 * writes to the reallocated request vs the local CPU ordering.
> -		 * Before the other CPU can overwrite the request, it will
> -		 * have updated our active->request and gone through a wmb.
> -		 * During the read here, we want to make sure that the values
> -		 * we see have not been overwritten as we do so - and we do
> -		 * that by serialising the second pointer check with the writes
> -		 * on other other CPUs.
> -		 *
> -		 * The corresponding write barrier is part of
> -		 * rcu_assign_pointer().
> -		 */
> -		smp_rmb();
> -		if (request == rcu_access_pointer(active->request))
> -			return flag(id);
> -	} while (1);
> +	/* This is racy. See __i915_gem_active_get_rcu() for a in detail
> +	 * discussion of how to handle the race correctly, but for reporting
> +	 * the busy state we err on the side of potentially reporting the
> +	 * wrong engine as being busy (but we guarantee that the result
> +	 * is at least self-consistent).
> +	 *
> +	 * As we use SLAB_DESTROY_BY_RCU, the request may be reallocated
> +	 * whilst we are inspecting it, even under the RCU read lock as we are.
> +	 * This means that there is a small window for the engine and/or the
> +	 * seqno to have been overwritten. The seqno will always be in the
> +	 * future compared to the intended, and so we know that if that
> +	 * seqno is idle (on whatever engine) our request is idle and the
> +	 * return 0 above is correct.
> +	 *
> +	 * The issue is that if the engine is switched, it is just as likely
> +	 * to report that it is busy (but since the switch happened, we know
> +	 * the request should be idle). So there is a small chance that a busy
> +	 * result is actually the wrong engine.
> +	 *
> +	 * So why don't we care?
> +	 *
> +	 * For starters, the busy ioctl is a heuristic that is by definition
> +	 * racy. Even with perfect serialisation in the driver, the hardware
> +	 * state is constantly advancing - the state we report to the user
> +	 * is stale.
> +	 *
> +	 * The critical information for the busy-ioctl is whether the object
> +	 * is idle as userspace relies on that to detect whether its next
> +	 * access will stall, or if it has missed submitting commands to
> +	 * the hardware allowing the GPU to stall. We never generate a
> +	 * false-positive for idleness, thus busy-ioctl is reliable at the
> +	 * most fundamental level, and we maintain the guarantee that a
> +	 * busy object left to itself will eventually become idle (and stay
> +	 * idle!).
> +	 *
> +	 * We allow ourselves the leeway of potentially misreporting the busy
> +	 * state because that is an optimisation heuristic that is constantly
> +	 * in flux. Being quickly able to detect the busy/idle state is more
> +	 * much important than accurate logging of which engines exactly were
> +	 * busy.
> +	 */
> +	return flag(request->engine->exec_id);
>  }
>  
>  static __always_inline unsigned int
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 452629de7a57..bfd6d59d5099 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -855,7 +855,15 @@ struct drm_i915_gem_busy {
>  	 * having flushed any pending activity), and a non-zero return that
>  	 * the object is still in-flight on the GPU. (The GPU has not yet
>  	 * signaled completion for all pending requests that reference the
> -	 * object.)
> +	 * object.) An object is guaranteed to become idle eventually (so
> +	 * long as new GPU commands are executed upon it). Due to the
> +	 * asynchronous natutre of the hardware, an object reported

s/natutre/nature.

> +	 * as busy may become idle before the ioctl is completed.
> +	 *
> +	 * Furthermore, if the object is busy, which engine is busy is only
> +	 * provided as a guide. There are race conditions which prevent the
> +	 * report from being always accurate. However, the converse is not
> +	 * true, if the object is idle the result is accurate.
>  	 *

In here I am thinking accurate in busyness wise or accurate in busyness
and engine wise. Perhaps explicitly state which values can be trusted in
which case of busyness.

Overall the comments are sterling as this is very hard to undestand
area. Thanks.

Joonas already stamped but you can also add
Reviewed-by: Mika Kuoppala <mika.kuoppala at intel.com>

>  	 * The returned dword is split into two fields to indicate both
>  	 * the engines on which the object is being read, and the
> @@ -878,6 +886,11 @@ struct drm_i915_gem_busy {
>  	 * execution engines, e.g. multiple media engines, which are
>  	 * mapped to the same identifier in the EXECBUFFER2 ioctl and
>  	 * so are not separately reported for busyness.
> +	 *
> +	 * Caveat emptor:
> +	 * Only the boolean result of this query is reliable; that is whether
> +	 * the object is idle or busy. The report of which engines are busy
> +	 * should be only used as a heuristic.
>  	 */
>  	__u32 busy;
>  };
> -- 
> 2.8.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx