[Intel-gfx] [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles

Thu Oct 8 05:32:07 PDT 2015

On Tue, Oct 06, 2015 at 03:52:01PM +0100, Nick Hoath wrote:
> There is a desire to simplify the i915 driver by reducing the number of
> different code paths introduced by the LRC / execlists support.  As the
> execlists request is now part of the gem request it is possible and
> desirable to unify the request life-cycles for execlist and legacy
> requests.
> 
> Added a context complete flag to a request which gets set during the
> context switch interrupt.

Wrong approach. Legacy uses the request itself as the indicator for when
the context is complete, if you were to take the same approach for LRC
we would not need to add yet more execlist specific state.

Lines of code is reduced by keeping the GEM semantics the same and just
having execlists hold an indepedent ref on the request for its async
submission approach.

> Added a function i915_gem_request_retireable().  A request is considered
> retireable if its seqno passed (i.e. the request has completed) and either
> it was never submitted to the ELSP or its context completed.  This ensures
> that context save is carried out before the last request for a context is
> considered retireable.  retire_requests_ring() now uses
> i915_gem_request_retireable() rather than request_complete() when deciding
> which requests to retire. Requests that were not waiting for a context
> switch interrupt (either as a result of being merged into a following
> request or by being a legacy request) will be considered retireable as
> soon as their seqno has passed.
> 
> Removed the extra request reference held for the execlist request.
> 
> Removed intel_execlists_retire_requests() and all references to
> intel_engine_cs.execlist_retired_req_list.
> 
> Moved context unpinning into retire_requests_ring() for now.  Further work
> is pending for the context pinning - this patch should allow us to use the
> active list to track context and ring buffer objects later.
> 
> Changed gen8_cs_irq_handler() so that notify_ring() is called when
> contexts complete as well as when a user interrupt occurs so that
> notification happens when a request is complete and context save has
> finished.
> 
> v2: Rebase over the read-read optimisation changes
> 
> v3: Reworked IRQ handler after removing IRQ handler cleanup patch
> 
> v4: Fixed various pin leaks
> 
> Issue: VIZ-4277
> Signed-off-by: Thomas Daniel <thomas.daniel at intel.com>
> Signed-off-by: Nick Hoath <nicholas.hoath at intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |  6 +++
>  drivers/gpu/drm/i915/i915_gem.c         | 67 +++++++++++++++++++++------
>  drivers/gpu/drm/i915/i915_irq.c         | 81 +++++++++++++++++++++------------
>  drivers/gpu/drm/i915/intel_lrc.c        | 43 +++--------------
>  drivers/gpu/drm/i915/intel_lrc.h        |  2 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
>  6 files changed, 118 insertions(+), 82 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index fbf0ae9..3d217f9 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2262,6 +2262,12 @@ struct drm_i915_gem_request {
>  	/** Execlists no. of times this request has been sent to the ELSP */
>  	int elsp_submitted;
>  
> +	/**
> +	 * Execlists: whether this requests's context has completed after
> +	 * submission to the ELSP
> +	 */
> +	bool ctx_complete;
> +
>  };
>  
>  int i915_gem_request_alloc(struct intel_engine_cs *ring,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 52642af..fc82171 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1386,6 +1386,24 @@ __i915_gem_request_retire__upto(struct drm_i915_gem_request *req)
>  				       typeof(*tmp), list);
>  
>  		i915_gem_request_retire(tmp);
> +
> +		if (i915.enable_execlists) {
> +			struct intel_context *ctx = tmp->ctx;
> +			struct drm_i915_private *dev_priv =
> +				engine->dev->dev_private;
> +			unsigned long flags;
> +			struct drm_i915_gem_object *ctx_obj =
> +				ctx->engine[engine->id].state;
> +
> +			spin_lock_irqsave(&engine->execlist_lock, flags);
> +
> +			if (ctx_obj && (ctx != engine->default_context))
> +				intel_lr_context_unpin(tmp);
> +
> +			intel_runtime_pm_put(dev_priv);
> +			spin_unlock_irqrestore(&engine->execlist_lock, flags);
> +		}

Why here? Surely you intended this to be called by
i915_gem_request_retire(). The runtime pm interaction is wrong, that is
handled by GPU busyness tracking. But by doing this at this juncture you
now increase the frequency at which we have to recreate the iomapping,
most noticeable on bsw+ and take more spinlocks unnecessarily.

Also since you no longer do reference tracking for the
execlists_queue, tmp has just been freed.

>  	} while (tmp != req);
>  
>  	WARN_ON(i915_verify_lists(engine->dev));
> @@ -2359,6 +2377,12 @@ void i915_vma_move_to_active(struct i915_vma *vma,
>  	list_move_tail(&vma->mm_list, &vma->vm->active_list);
>  }
>  
> +static bool i915_gem_request_retireable(struct drm_i915_gem_request *req)
> +{
> +	return (i915_gem_request_completed(req, true) &&
> +		(!req->elsp_submitted || req->ctx_complete));

I disagree with this completely. A request must only be considered complete
when it's seqno is passed. The context should be tracking the request
and not the other way around (like legacy does).

The approach is just wrong. The lifecycle of the request is not the
bloat in execlists, and keeping an extra reference on the request struct
itself them until execlists is able to reap them amoritizes the cost of
the spinlock and atomic operations.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre