[Intel-gfx] [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles

Fri Oct 9 00:58:51 PDT 2015

On Thu, Oct 08, 2015 at 01:32:07PM +0100, Chris Wilson wrote:
> On Tue, Oct 06, 2015 at 03:52:01PM +0100, Nick Hoath wrote:
> > There is a desire to simplify the i915 driver by reducing the number of
> > different code paths introduced by the LRC / execlists support.  As the
> > execlists request is now part of the gem request it is possible and
> > desirable to unify the request life-cycles for execlist and legacy
> > requests.
> > 
> > Added a context complete flag to a request which gets set during the
> > context switch interrupt.
> 
> Wrong approach. Legacy uses the request itself as the indicator for when
> the context is complete, if you were to take the same approach for LRC
> we would not need to add yet more execlist specific state.
> 
> Lines of code is reduced by keeping the GEM semantics the same and just
> having execlists hold an indepedent ref on the request for its async
> submission approach.
>  
> > Added a function i915_gem_request_retireable().  A request is considered
> > retireable if its seqno passed (i.e. the request has completed) and either
> > it was never submitted to the ELSP or its context completed.  This ensures
> > that context save is carried out before the last request for a context is
> > considered retireable.  retire_requests_ring() now uses
> > i915_gem_request_retireable() rather than request_complete() when deciding
> > which requests to retire. Requests that were not waiting for a context
> > switch interrupt (either as a result of being merged into a following
> > request or by being a legacy request) will be considered retireable as
> > soon as their seqno has passed.
> > 
> > Removed the extra request reference held for the execlist request.
> > 
> > Removed intel_execlists_retire_requests() and all references to
> > intel_engine_cs.execlist_retired_req_list.
> > 
> > Moved context unpinning into retire_requests_ring() for now.  Further work
> > is pending for the context pinning - this patch should allow us to use the
> > active list to track context and ring buffer objects later.
> > 
> > Changed gen8_cs_irq_handler() so that notify_ring() is called when
> > contexts complete as well as when a user interrupt occurs so that
> > notification happens when a request is complete and context save has
> > finished.
> > 
> > v2: Rebase over the read-read optimisation changes
> > 
> > v3: Reworked IRQ handler after removing IRQ handler cleanup patch
> > 
> > v4: Fixed various pin leaks
> > 
> > Issue: VIZ-4277
> > Signed-off-by: Thomas Daniel <thomas.daniel at intel.com>
> > Signed-off-by: Nick Hoath <nicholas.hoath at intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h         |  6 +++
> >  drivers/gpu/drm/i915/i915_gem.c         | 67 +++++++++++++++++++++------
> >  drivers/gpu/drm/i915/i915_irq.c         | 81 +++++++++++++++++++++------------
> >  drivers/gpu/drm/i915/intel_lrc.c        | 43 +++--------------
> >  drivers/gpu/drm/i915/intel_lrc.h        |  2 +-
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
> >  6 files changed, 118 insertions(+), 82 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index fbf0ae9..3d217f9 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2262,6 +2262,12 @@ struct drm_i915_gem_request {
> >  	/** Execlists no. of times this request has been sent to the ELSP */
> >  	int elsp_submitted;
> >  
> > +	/**
> > +	 * Execlists: whether this requests's context has completed after
> > +	 * submission to the ELSP
> > +	 */
> > +	bool ctx_complete;
> > +
> >  };
> >  
> >  int i915_gem_request_alloc(struct intel_engine_cs *ring,
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 52642af..fc82171 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1386,6 +1386,24 @@ __i915_gem_request_retire__upto(struct drm_i915_gem_request *req)
> >  				       typeof(*tmp), list);
> >  
> >  		i915_gem_request_retire(tmp);
> > +
> > +		if (i915.enable_execlists) {
> > +			struct intel_context *ctx = tmp->ctx;
> > +			struct drm_i915_private *dev_priv =
> > +				engine->dev->dev_private;
> > +			unsigned long flags;
> > +			struct drm_i915_gem_object *ctx_obj =
> > +				ctx->engine[engine->id].state;
> > +
> > +			spin_lock_irqsave(&engine->execlist_lock, flags);
> > +
> > +			if (ctx_obj && (ctx != engine->default_context))
> > +				intel_lr_context_unpin(tmp);
> > +
> > +			intel_runtime_pm_put(dev_priv);
> > +			spin_unlock_irqrestore(&engine->execlist_lock, flags);
> > +		}
> 
> Why here? Surely you intended this to be called by
> i915_gem_request_retire(). The runtime pm interaction is wrong, that is
> handled by GPU busyness tracking. But by doing this at this juncture you
> now increase the frequency at which we have to recreate the iomapping,
> most noticeable on bsw+ and take more spinlocks unnecessarily.
> 
> Also since you no longer do reference tracking for the
> execlists_queue, tmp has just been freed.
> 
> >  	} while (tmp != req);
> >  
> >  	WARN_ON(i915_verify_lists(engine->dev));
> > @@ -2359,6 +2377,12 @@ void i915_vma_move_to_active(struct i915_vma *vma,
> >  	list_move_tail(&vma->mm_list, &vma->vm->active_list);
> >  }
> >  
> > +static bool i915_gem_request_retireable(struct drm_i915_gem_request *req)
> > +{
> > +	return (i915_gem_request_completed(req, true) &&
> > +		(!req->elsp_submitted || req->ctx_complete));
> 
> I disagree with this completely. A request must only be considered complete
> when it's seqno is passed. The context should be tracking the request
> and not the other way around (like legacy does).
> 
> The approach is just wrong. The lifecycle of the request is not the
> bloat in execlists, and keeping an extra reference on the request struct
> itself them until execlists is able to reap them amoritizes the cost of
> the spinlock and atomic operations.

Hm, that's slightly different approach than what I suggested with just
creating special ctx-switch requests (since with the scheduler we'll
reorder so can't do the same trick as with legacy ctx switching any more).
This trick here otoh extends the lifetime of a request until it's kicked
out, so that the shrinker doesn't it the request object too early.

How does holding an additional ref on the request prevent this? I mean I
don't see how that would prevent special casing in the eviction code ...
Or is your plan to put the final waiting for the request the ctx to idle
into the ->vma_unbind hook? That seems again a bit a quirk ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch