[Intel-gfx] [RFC] drm/i915/bdw+: Do not emit user interrupts when not needed

Fri Dec 18 04:28:36 PST 2015

On Fri, Dec 18, 2015 at 11:59:41AM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> We can rely on context complete interrupt to wake up the waiters
> apart in the case where requests are merged into a single ELSP
> submission. In this case we inject MI_USER_INTERRUPTS in the
> ring buffer to ensure prompt wake-ups.
> 
> This optimization has the effect on for example GLBenchmark
> Egypt off-screen test of decreasing the number of generated
> interrupts per second by a factor of two, and context switched
> by factor of five to six.

I half like it. Are the interupts a limiting factor in this case though?
This should be ~100 waits/second with ~1000 batches/second, right? What
is the delay between request completion and client wakeup - difficult to
measure after you remove the user interrupt though! But I estimate it
should be on the order of just a few GPU cycles.

> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 27f06198a51e..d9be878dbde7 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -359,6 +359,13 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
>  	spin_unlock(&dev_priv->uncore.lock);
>  }
>  
> +static void execlists_emit_user_interrupt(struct drm_i915_gem_request *req)
> +{
> +	struct intel_ringbuffer *ringbuf = req->ringbuf;
> +
> +	iowrite32(MI_USER_INTERRUPT, ringbuf->virtual_start + req->tail - 8);
> +}
> +
>  static int execlists_update_context(struct drm_i915_gem_request *rq)
>  {
>  	struct intel_engine_cs *ring = rq->ring;
> @@ -433,6 +440,12 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
>  			cursor->elsp_submitted = req0->elsp_submitted;
>  			list_move_tail(&req0->execlist_link,
>  				       &ring->execlist_retired_req_list);
> +			/*
> +			 * When merging requests make sure there is still
> +			 * something after each batch buffer to wake up waiters.
> +			 */
> +			if (cursor != req0)
> +				execlists_emit_user_interrupt(req0);

You may have already missed this instruction as you patch it, and keep
doing so as long as the context is resubmitted. I think to be safe, you
need to patch cursor as well. You could then MI_NOOP out the MI_INTERUPT
on the terminal request.

An interesting igt experiement I think would be:

thread A, keep queuing batches with just a single MI_STORE_DWORD_IMM *addr
thread B, waits on batch from A, reads *addr (asynchronously), measures
latency (actual value - expected(batch))

Run for 10s, report min/max/median latency.

Repeat for more threads/contexts and more waiters. Ah, that may be the
demonstration for the thundering herd I've been looking for!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre