[Intel-gfx] [PATCH v7 6/8] drm/i915: Interrupt driven fences

Thu Apr 21 07:38:06 UTC 2016

Op 20-04-16 om 19:09 schreef John.C.Harrison at Intel.com:
> From: John Harrison <John.C.Harrison at Intel.com>
>
> The intended usage model for struct fence is that the signalled status
> should be set on demand rather than polled. That is, there should not
> be a need for a 'signaled' function to be called everytime the status
> is queried. Instead, 'something' should be done to enable a signal
> callback from the hardware which will update the state directly. In
> the case of requests, this is the seqno update interrupt. The idea is
> that this callback will only be enabled on demand when something
> actually tries to wait on the fence.
>
> This change removes the polling test and replaces it with the callback
> scheme. Each fence is added to a 'please poke me' list at the start of
> i915_add_request(). The interrupt handler then scans through the 'poke
> me' list when a new seqno pops out and signals any matching
> fence/request. The fence is then removed from the list so the entire
> request stack does not need to be scanned every time. Note that the
> fence is added to the list before the commands to generate the seqno
> interrupt are added to the ring. Thus the sequence is guaranteed to be
> race free if the interrupt is already enabled.
>
> Note that the interrupt is only enabled on demand (i.e. when
> __wait_request() is called). Thus there is still a potential race when
> enabling the interrupt as the request may already have completed.
> However, this is simply solved by calling the interrupt processing
> code immediately after enabling the interrupt and thereby checking for
> already completed requests.
>
> Lastly, the ring clean up code has the possibility to cancel
> outstanding requests (e.g. because TDR has reset the ring). These
> requests will never get signalled and so must be removed from the
> signal list manually. This is done by setting a 'cancelled' flag and
> then calling the regular notify/retire code path rather than
> attempting to duplicate the list manipulatation and clean up code in
> multiple places. This also avoid any race condition where the
> cancellation request might occur after/during the completion interrupt
> actually arriving.
>
> v2: Updated to take advantage of the request unreference no longer
> requiring the mutex lock.
>
> v3: Move the signal list processing around to prevent unsubmitted
> requests being added to the list. This was occurring on Android
> because the native sync implementation calls the
> fence->enable_signalling API immediately on fence creation.
>
> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
> 'link' instead of 'list'. Added support for returning an error code on
> a cancelled fence. Update list processing to be more efficient/safer
> with respect to spinlocks.
>
> v5: Made i915_gem_request_submit a static as it is only ever called
> from one place.
>
> Fixed up the low latency wait optimisation. The time delay between the
> seqno value being to memory and the drive's ISR running can be
> significant, at least for the wait request micro-benchmark. This can
> be greatly improved by explicitly checking for seqno updates in the
> pre-wait busy poll loop. Also added some documentation comments to the
> busy poll code.
>
> Fixed up support for the faking of lost interrupts
> (test_irq_rings/missed_irq_rings). That is, there is an IGT test that
> tells the driver to loose interrupts deliberately and then check that
> everything still works as expected (albeit much slower).
>
> Updates from review comments: use non IRQ-save spinlocking, early exit
> on WARN and improved comments (Tvrtko Ursulin).
>
> v6: Updated to newer nigthly and resolved conflicts around the
> wait_request busy spin optimisation. Also fixed a race condition
> between this early exit path and the regular completion path.
>
> v7: Updated to newer nightly - lots of ring -> engine renaming plus an
> interface change on get_seqno(). Also added a list_empty() check
> before acquring spinlocks and doing list processing.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |   8 +
>  drivers/gpu/drm/i915/i915_gem.c         | 249 +++++++++++++++++++++++++++++---
>  drivers/gpu/drm/i915/i915_irq.c         |   2 +
>  drivers/gpu/drm/i915/intel_lrc.c        |   2 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
>  6 files changed, 241 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 81324ba..9519b11 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2255,7 +2255,12 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>  struct drm_i915_gem_request {
>  	/** Underlying object for implementing the signal/wait stuff. */
>  	struct fence fence;
> +	struct list_head signal_link;
> +	struct list_head unsignal_link;
>  	struct list_head delayed_free_link;
> +	bool cancelled;
> +	bool irq_enabled;
> +	bool signal_requested;
>  
>  	/** On Which ring this request was generated */
>  	struct drm_i915_private *i915;
> @@ -2341,6 +2346,9 @@ struct drm_i915_gem_request * __must_check
>  i915_gem_request_alloc(struct intel_engine_cs *engine,
>  		       struct intel_context *ctx);
>  void i915_gem_request_cancel(struct drm_i915_gem_request *req);
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
> +				       bool fence_locked);
> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
>  
>  int i915_create_fence_timeline(struct drm_device *dev,
>  			       struct intel_context *ctx,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9071058..9b1de5f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -40,6 +40,8 @@
>  
>  #define RQ_BUG_ON(expr)
>  
> +static void i915_gem_request_submit(struct drm_i915_gem_request *req);
> +
>  static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
>  static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
>  static void
> @@ -1253,9 +1255,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  	struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
>  	struct drm_device *dev = engine->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> -	const bool irq_test_in_progress =
> -		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
>  	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> +	uint32_t seqno;
>  	DEFINE_WAIT(wait);
>  	unsigned long timeout_expire;
>  	s64 before = 0; /* Only to silence a compiler warning. */
> @@ -1263,9 +1264,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  
>  	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>  
> -	if (list_empty(&req->list))
> -		return 0;
> -
>  	if (i915_gem_request_completed(req))
>  		return 0;
>  
> @@ -1291,15 +1289,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  	trace_i915_gem_request_wait_begin(req);
>  
>  	/* Optimistic spin for the next jiffie before touching IRQs */
> -	ret = __i915_spin_request(req, state);
> -	if (ret == 0)
> -		goto out;
> -
> -	if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
> -		ret = -ENODEV;
> -		goto out;
> +	if (req->seqno) {
> +		ret = __i915_spin_request(req, state);
> +		if (ret == 0)
> +			goto out;
>  	}
>  
> +	/*
> +	 * Enable interrupt completion of the request.
> +	 */
> +	fence_enable_sw_signaling(&req->fence);
> +
>  	for (;;) {
>  		struct timer_list timer;
>  
> @@ -1321,6 +1321,21 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  			break;
>  		}
>  
> +		if (req->seqno) {
> +			/*
> +			 * There is quite a lot of latency in the user interrupt
> +			 * path. So do an explicit seqno check and potentially
> +			 * remove all that delay.
> +			 */
> +			if (req->engine->irq_seqno_barrier)
> +				req->engine->irq_seqno_barrier(req->engine);
> +			seqno = engine->get_seqno(engine);
You're using this barrier + get_seqno pattern multiple times, maybe write a helper function for this?
> +			if (i915_seqno_passed(seqno, req->seqno)) {
> +				ret = 0;
> +				break;
> +			}
> +		}
> +
>  		if (signal_pending_state(state, current)) {
>  			ret = -ERESTARTSYS;
>  			break;
> @@ -1347,14 +1362,32 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  			destroy_timer_on_stack(&timer);
>  		}
>  	}
> -	if (!irq_test_in_progress)
> -		engine->irq_put(engine);
>  
>  	finish_wait(&engine->irq_queue, &wait);
>  
>  out:
>  	trace_i915_gem_request_wait_end(req);
>  
> +	if ((ret == 0) && (req->seqno)) {
> +		if (req->engine->irq_seqno_barrier)
> +			req->engine->irq_seqno_barrier(req->engine);
> +		seqno = engine->get_seqno(engine);
> +		if (i915_seqno_passed(seqno, req->seqno) &&
> +		    !i915_gem_request_completed(req)) {
> +			/*
> +			 * Make sure the request is marked as completed before
> +			 * returning. NB: Need to acquire the spinlock around
> +			 * the whole call to avoid a race condition with the
> +			 * interrupt handler is running concurrently and could
> +			 * cause this invocation to early exit even though the
> +			 * request has not actually been fully processed yet.
> +			 */
> +			spin_lock_irq(&req->engine->fence_lock);
> +			i915_gem_request_notify(req->engine, true);
> +			spin_unlock_irq(&req->engine->fence_lock);
> +		}
> +	}
> +
>  	if (timeout) {
>  		s64 tres = *timeout - (ktime_get_raw_ns() - before);
>  
> @@ -1432,6 +1465,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>  	list_del_init(&request->list);
>  	i915_gem_request_remove_from_client(request);
>  
> +	/*
> +	 * In case the request is still in the signal pending list,
> +	 * e.g. due to being cancelled by TDR, preemption, etc.
> +	 */
> +	if (!list_empty(&request->signal_link)) {
> +		/*
> +		 * The request must be marked as cancelled and the underlying
> +		 * fence as failed. NB: There is no explicit fence fail API,
> +		 * there is only a manual poke and signal.
> +		 */
> +		request->cancelled = true;
> +		/* How to propagate to any associated sync_fence??? */
> +		request->fence.status = -EIO;
> +		fence_signal_locked(&request->fence);
> +	}
>
Can this still happen with

commit aa9b78104fe3210758fa9e6c644e9a108d371e8b
Author: Chris Wilson <chris at chris-wilson.co.uk>
Date:   Wed Apr 13 17:35:15 2016 +0100

    drm/i915: Late request cancellations are harmful

?