[Intel-gfx] [PATCH v8 4/6] drm/i915: Interrupt driven fences

Thu May 19 09:24:17 UTC 2016

Op 12-05-16 om 23:06 schreef John.C.Harrison at Intel.com:
> From: John Harrison <John.C.Harrison at Intel.com>
>
> The intended usage model for struct fence is that the signalled status
> should be set on demand rather than polled. That is, there should not
> be a need for a 'signaled' function to be called everytime the status
> is queried. Instead, 'something' should be done to enable a signal
> callback from the hardware which will update the state directly. In
> the case of requests, this is the seqno update interrupt. The idea is
> that this callback will only be enabled on demand when something
> actually tries to wait on the fence.
>
> This change removes the polling test and replaces it with the callback
> scheme. Each fence is added to a 'please poke me' list at the start of
> i915_add_request(). The interrupt handler then scans through the 'poke
> me' list when a new seqno pops out and signals any matching
> fence/request. The fence is then removed from the list so the entire
> request stack does not need to be scanned every time. Note that the
> fence is added to the list before the commands to generate the seqno
> interrupt are added to the ring. Thus the sequence is guaranteed to be
> race free if the interrupt is already enabled.
>
> Note that the interrupt is only enabled on demand (i.e. when
> __wait_request() is called). Thus there is still a potential race when
> enabling the interrupt as the request may already have completed.
> However, this is simply solved by calling the interrupt processing
> code immediately after enabling the interrupt and thereby checking for
> already completed requests.
>
> Lastly, the ring clean up code has the possibility to cancel
> outstanding requests (e.g. because TDR has reset the ring). These
> requests will never get signalled and so must be removed from the
> signal list manually. This is done by setting a 'cancelled' flag and
> then calling the regular notify/retire code path rather than
> attempting to duplicate the list manipulatation and clean up code in
> multiple places. This also avoid any race condition where the
> cancellation request might occur after/during the completion interrupt
> actually arriving.
>
> v2: Updated to take advantage of the request unreference no longer
> requiring the mutex lock.
>
> v3: Move the signal list processing around to prevent unsubmitted
> requests being added to the list. This was occurring on Android
> because the native sync implementation calls the
> fence->enable_signalling API immediately on fence creation.
>
> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
> 'link' instead of 'list'. Added support for returning an error code on
> a cancelled fence. Update list processing to be more efficient/safer
> with respect to spinlocks.
>
> v5: Made i915_gem_request_submit a static as it is only ever called
> from one place.
>
> Fixed up the low latency wait optimisation. The time delay between the
> seqno value being to memory and the drive's ISR running can be
> significant, at least for the wait request micro-benchmark. This can
> be greatly improved by explicitly checking for seqno updates in the
> pre-wait busy poll loop. Also added some documentation comments to the
> busy poll code.
>
> Fixed up support for the faking of lost interrupts
> (test_irq_rings/missed_irq_rings). That is, there is an IGT test that
> tells the driver to loose interrupts deliberately and then check that
> everything still works as expected (albeit much slower).
>
> Updates from review comments: use non IRQ-save spinlocking, early exit
> on WARN and improved comments (Tvrtko Ursulin).
>
> v6: Updated to newer nigthly and resolved conflicts around the
> wait_request busy spin optimisation. Also fixed a race condition
> between this early exit path and the regular completion path.
>
> v7: Updated to newer nightly - lots of ring -> engine renaming plus an
> interface change on get_seqno(). Also added a list_empty() check
> before acquring spinlocks and doing list processing.
>
> v8: Updated to newer nightly - changes to request clean up code mean
> non of the deferred free mess is needed any more.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |   7 +
>  drivers/gpu/drm/i915/i915_gem.c         | 235 +++++++++++++++++++++++++++++---
>  drivers/gpu/drm/i915/i915_irq.c         |   2 +
>  drivers/gpu/drm/i915/intel_lrc.c        |   1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
>  6 files changed, 225 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 433d99a..3adac59 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2278,6 +2278,10 @@ struct drm_i915_gem_request {
>  	 */
>  	struct fence fence;
>  	struct rcu_head rcu_head;
> +	struct list_head signal_link;
> +	bool cancelled;
> +	bool irq_enabled;
> +	bool signal_requested;
>  
>  	/** On Which ring this request was generated */
>  	struct drm_i915_private *i915;
> @@ -2379,6 +2383,9 @@ struct drm_i915_gem_request {
>  struct drm_i915_gem_request * __must_check
>  i915_gem_request_alloc(struct intel_engine_cs *engine,
>  		       struct intel_context *ctx);
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
> +				       bool fence_locked);
> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
>  
>  static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 958edcb..6669508 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -39,6 +39,8 @@
>  #include <linux/pci.h>
>  #include <linux/dma-buf.h>
>  
> +static void i915_gem_request_submit(struct drm_i915_gem_request *req);
> +
>  static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
>  static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
>  static void
> @@ -1238,9 +1240,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  	struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
>  	struct drm_device *dev = engine->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> -	const bool irq_test_in_progress =
> -		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
>  	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> +	uint32_t seqno;
>  	DEFINE_WAIT(wait);
>  	unsigned long timeout_expire;
>  	s64 before = 0; /* Only to silence a compiler warning. */
> @@ -1248,9 +1249,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  
>  	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>  
> -	if (list_empty(&req->list))
> -		return 0;
> -
>  	if (i915_gem_request_completed(req))
>  		return 0;
>  
> @@ -1276,15 +1274,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  	trace_i915_gem_request_wait_begin(req);
>  
>  	/* Optimistic spin for the next jiffie before touching IRQs */
> -	ret = __i915_spin_request(req, state);
> -	if (ret == 0)
> -		goto out;
> -
> -	if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
> -		ret = -ENODEV;
> -		goto out;
> +	if (req->seqno) {
> +		ret = __i915_spin_request(req, state);
> +		if (ret == 0)
> +			goto out;
>  	}
>  
> +	/*
> +	 * Enable interrupt completion of the request.
> +	 */
> +	fence_enable_sw_signaling(&req->fence);
> +
>  	for (;;) {
>  		struct timer_list timer;
>  
> @@ -1307,6 +1307,21 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  			break;
>  		}
>  
> +		if (req->seqno) {
> +			/*
> +			 * There is quite a lot of latency in the user interrupt
> +			 * path. So do an explicit seqno check and potentially
> +			 * remove all that delay.
> +			 */
> +			if (req->engine->irq_seqno_barrier)
> +				req->engine->irq_seqno_barrier(req->engine);
> +			seqno = engine->get_seqno(engine);
> +			if (i915_seqno_passed(seqno, req->seqno)) {
> +				ret = 0;
> +				break;
> +			}
> +		}
> +
>  		if (signal_pending_state(state, current)) {
>  			ret = -ERESTARTSYS;
>  			break;
> @@ -1333,14 +1348,32 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  			destroy_timer_on_stack(&timer);
>  		}
>  	}
> -	if (!irq_test_in_progress)
> -		engine->irq_put(engine);
>  
>  	finish_wait(&engine->irq_queue, &wait);
>  
>  out:
>  	trace_i915_gem_request_wait_end(req);
>  
> +	if ((ret == 0) && (req->seqno)) {
> +		if (req->engine->irq_seqno_barrier)
> +			req->engine->irq_seqno_barrier(req->engine);
> +		seqno = engine->get_seqno(engine);
> +		if (i915_seqno_passed(seqno, req->seqno) &&
> +		    !i915_gem_request_completed(req)) {
> +			/*
> +			 * Make sure the request is marked as completed before
> +			 * returning. NB: Need to acquire the spinlock around
> +			 * the whole call to avoid a race condition with the
> +			 * interrupt handler is running concurrently and could
> +			 * cause this invocation to early exit even though the
> +			 * request has not actually been fully processed yet.
> +			 */
> +			spin_lock_irq(&req->engine->fence_lock);
> +			i915_gem_request_notify(req->engine, true);
> +			spin_unlock_irq(&req->engine->fence_lock);
> +		}
> +	}
> +
>  	if (timeout) {
>  		s64 tres = *timeout - (ktime_get_raw_ns() - before);
>  
> @@ -1406,6 +1439,11 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>  {
>  	trace_i915_gem_request_retire(request);
>  
> +	if (request->irq_enabled) {
> +		request->engine->irq_put(request->engine);
> +		request->irq_enabled = false;
> +	}
> +
>  	/* We know the GPU must have read the request to have
>  	 * sent us the seqno + interrupt, so use the position
>  	 * of tail of the request to update the last known position
> @@ -1419,6 +1457,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>  	list_del_init(&request->list);
>  	i915_gem_request_remove_from_client(request);
>  
> +	/*
> +	 * In case the request is still in the signal pending list,
> +	 * e.g. due to being cancelled by TDR, preemption, etc.
> +	 */
> +	if (!list_empty(&request->signal_link)) {
> +		/*
> +		 * The request must be marked as cancelled and the underlying
> +		 * fence as failed. NB: There is no explicit fence fail API,
> +		 * there is only a manual poke and signal.
> +		 */
> +		request->cancelled = true;
> +		/* How to propagate to any associated sync_fence??? */
> +		request->fence.status = -EIO;
> +		fence_signal_locked(&request->fence);
> +	}
> +
>  	if (request->previous_context) {
>  		if (i915.enable_execlists)
>  			intel_lr_context_unpin(request->previous_context,
> @@ -2650,6 +2704,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>  	 */
>  	request->postfix = intel_ring_get_tail(ringbuf);
>  
> +	/*
> +	 * Add the fence to the pending list before emitting the commands to
> +	 * generate a seqno notification interrupt.
> +	 */
> +	i915_gem_request_submit(request);
> +
>  	if (i915.enable_execlists)
>  		ret = engine->emit_request(request);
>  	else {
> @@ -2735,25 +2795,141 @@ static void i915_gem_request_free(struct fence *req_fence)
>  	struct drm_i915_gem_request *req;
>  
>  	req = container_of(req_fence, typeof(*req), fence);
> +
> +	WARN_ON(req->irq_enabled);
> +
>  	call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
>  }
>  
> -static bool i915_gem_request_enable_signaling(struct fence *req_fence)
> +/*
> + * The request is about to be submitted to the hardware so add the fence to
> + * the list of signalable fences.
> + *
> + * NB: This does not necessarily enable interrupts yet. That only occurs on
> + * demand when the request is actually waited on. However, adding it to the
> + * list early ensures that there is no race condition where the interrupt
> + * could pop out prematurely and thus be completely lost. The race is merely
> + * that the interrupt must be manually checked for after being enabled.
> + */
> +static void i915_gem_request_submit(struct drm_i915_gem_request *req)
>  {
> -	/* Interrupt driven fences are not implemented yet.*/
> -	WARN(true, "This should not be called!");
> -	return true;
> +	/*
> +	 * Always enable signal processing for the request's fence object
> +	 * before that request is submitted to the hardware. Thus there is no
> +	 * race condition whereby the interrupt could pop out before the
> +	 * request has been added to the signal list. Hence no need to check
> +	 * for completion, undo the list add and return false.
> +	 */
> +	i915_gem_request_reference(req);
> +	spin_lock_irq(&req->engine->fence_lock);
> +	WARN_ON(!list_empty(&req->signal_link));
> +	list_add_tail(&req->signal_link, &req->engine->fence_signal_list);
> +	spin_unlock_irq(&req->engine->fence_lock);
> +
> +	/*
> +	 * NB: Interrupts are only enabled on demand. Thus there is still a
> +	 * race where the request could complete before the interrupt has
> +	 * been enabled. Thus care must be taken at that point.
> +	 */
> +
> +	/* Have interrupts already been requested? */
> +	if (req->signal_requested)
> +		i915_gem_request_enable_interrupt(req, false);
> +}
> +
> +/*
> + * The request is being actively waited on, so enable interrupt based
> + * completion signalling.
> + */
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
> +				       bool fence_locked)
> +{
> +	struct drm_i915_private *dev_priv = to_i915(req->engine->dev);
> +	const bool irq_test_in_progress =
> +		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) &
> +						intel_engine_flag(req->engine);
> +
> +	if (req->irq_enabled)
> +		return;
> +
> +	if (irq_test_in_progress)
> +		return;
> +
> +	if (!WARN_ON(!req->engine->irq_get(req->engine)))
> +		req->irq_enabled = true;
> +
> +	/*
> +	 * Because the interrupt is only enabled on demand, there is a race
> +	 * where the interrupt can fire before anyone is looking for it. So
> +	 * do an explicit check for missed interrupts.
> +	 */
> +	i915_gem_request_notify(req->engine, fence_locked);
>  }
>  
> -static bool i915_gem_request_is_completed(struct fence *req_fence)
> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>  {
>  	struct drm_i915_gem_request *req = container_of(req_fence,
>  						 typeof(*req), fence);
> +
> +	/*
> +	 * No need to actually enable interrupt based processing until the
> +	 * request has been submitted to the hardware. At which point
> +	 * 'i915_gem_request_submit()' is called. So only really enable
> +	 * signalling in there. Just set a flag to say that interrupts are
> +	 * wanted when the request is eventually submitted. On the other hand
> +	 * if the request has already been submitted then interrupts do need
> +	 * to be enabled now.
> +	 */
> +
> +	req->signal_requested = true;
> +
> +	if (!list_empty(&req->signal_link))
> +		i915_gem_request_enable_interrupt(req, true);
> +
> +	return true;
> +}
> +
> +void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
> +{
> +	struct drm_i915_gem_request *req, *req_next;
> +	unsigned long flags;
>  	u32 seqno;
>  
> -	seqno = req->engine->get_seqno(req->engine);
> +	if (list_empty(&engine->fence_signal_list))
> +		return;
> +
If you don't hold the lock then you should probably use list_empty_careful.

~Maarten