[Intel-gfx] [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
Daniel Vetter
daniel at ffwll.ch
Fri Oct 9 00:58:51 PDT 2015
On Thu, Oct 08, 2015 at 01:32:07PM +0100, Chris Wilson wrote:
> On Tue, Oct 06, 2015 at 03:52:01PM +0100, Nick Hoath wrote:
> > There is a desire to simplify the i915 driver by reducing the number of
> > different code paths introduced by the LRC / execlists support. As the
> > execlists request is now part of the gem request it is possible and
> > desirable to unify the request life-cycles for execlist and legacy
> > requests.
> >
> > Added a context complete flag to a request which gets set during the
> > context switch interrupt.
>
> Wrong approach. Legacy uses the request itself as the indicator for when
> the context is complete, if you were to take the same approach for LRC
> we would not need to add yet more execlist specific state.
>
> Lines of code is reduced by keeping the GEM semantics the same and just
> having execlists hold an indepedent ref on the request for its async
> submission approach.
>
> > Added a function i915_gem_request_retireable(). A request is considered
> > retireable if its seqno passed (i.e. the request has completed) and either
> > it was never submitted to the ELSP or its context completed. This ensures
> > that context save is carried out before the last request for a context is
> > considered retireable. retire_requests_ring() now uses
> > i915_gem_request_retireable() rather than request_complete() when deciding
> > which requests to retire. Requests that were not waiting for a context
> > switch interrupt (either as a result of being merged into a following
> > request or by being a legacy request) will be considered retireable as
> > soon as their seqno has passed.
> >
> > Removed the extra request reference held for the execlist request.
> >
> > Removed intel_execlists_retire_requests() and all references to
> > intel_engine_cs.execlist_retired_req_list.
> >
> > Moved context unpinning into retire_requests_ring() for now. Further work
> > is pending for the context pinning - this patch should allow us to use the
> > active list to track context and ring buffer objects later.
> >
> > Changed gen8_cs_irq_handler() so that notify_ring() is called when
> > contexts complete as well as when a user interrupt occurs so that
> > notification happens when a request is complete and context save has
> > finished.
> >
> > v2: Rebase over the read-read optimisation changes
> >
> > v3: Reworked IRQ handler after removing IRQ handler cleanup patch
> >
> > v4: Fixed various pin leaks
> >
> > Issue: VIZ-4277
> > Signed-off-by: Thomas Daniel <thomas.daniel at intel.com>
> > Signed-off-by: Nick Hoath <nicholas.hoath at intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_drv.h | 6 +++
> > drivers/gpu/drm/i915/i915_gem.c | 67 +++++++++++++++++++++------
> > drivers/gpu/drm/i915/i915_irq.c | 81 +++++++++++++++++++++------------
> > drivers/gpu/drm/i915/intel_lrc.c | 43 +++--------------
> > drivers/gpu/drm/i915/intel_lrc.h | 2 +-
> > drivers/gpu/drm/i915/intel_ringbuffer.h | 1 -
> > 6 files changed, 118 insertions(+), 82 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index fbf0ae9..3d217f9 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2262,6 +2262,12 @@ struct drm_i915_gem_request {
> > /** Execlists no. of times this request has been sent to the ELSP */
> > int elsp_submitted;
> >
> > + /**
> > + * Execlists: whether this requests's context has completed after
> > + * submission to the ELSP
> > + */
> > + bool ctx_complete;
> > +
> > };
> >
> > int i915_gem_request_alloc(struct intel_engine_cs *ring,
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 52642af..fc82171 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1386,6 +1386,24 @@ __i915_gem_request_retire__upto(struct drm_i915_gem_request *req)
> > typeof(*tmp), list);
> >
> > i915_gem_request_retire(tmp);
> > +
> > + if (i915.enable_execlists) {
> > + struct intel_context *ctx = tmp->ctx;
> > + struct drm_i915_private *dev_priv =
> > + engine->dev->dev_private;
> > + unsigned long flags;
> > + struct drm_i915_gem_object *ctx_obj =
> > + ctx->engine[engine->id].state;
> > +
> > + spin_lock_irqsave(&engine->execlist_lock, flags);
> > +
> > + if (ctx_obj && (ctx != engine->default_context))
> > + intel_lr_context_unpin(tmp);
> > +
> > + intel_runtime_pm_put(dev_priv);
> > + spin_unlock_irqrestore(&engine->execlist_lock, flags);
> > + }
>
> Why here? Surely you intended this to be called by
> i915_gem_request_retire(). The runtime pm interaction is wrong, that is
> handled by GPU busyness tracking. But by doing this at this juncture you
> now increase the frequency at which we have to recreate the iomapping,
> most noticeable on bsw+ and take more spinlocks unnecessarily.
>
> Also since you no longer do reference tracking for the
> execlists_queue, tmp has just been freed.
>
> > } while (tmp != req);
> >
> > WARN_ON(i915_verify_lists(engine->dev));
> > @@ -2359,6 +2377,12 @@ void i915_vma_move_to_active(struct i915_vma *vma,
> > list_move_tail(&vma->mm_list, &vma->vm->active_list);
> > }
> >
> > +static bool i915_gem_request_retireable(struct drm_i915_gem_request *req)
> > +{
> > + return (i915_gem_request_completed(req, true) &&
> > + (!req->elsp_submitted || req->ctx_complete));
>
> I disagree with this completely. A request must only be considered complete
> when it's seqno is passed. The context should be tracking the request
> and not the other way around (like legacy does).
>
> The approach is just wrong. The lifecycle of the request is not the
> bloat in execlists, and keeping an extra reference on the request struct
> itself them until execlists is able to reap them amoritizes the cost of
> the spinlock and atomic operations.
Hm, that's slightly different approach than what I suggested with just
creating special ctx-switch requests (since with the scheduler we'll
reorder so can't do the same trick as with legacy ctx switching any more).
This trick here otoh extends the lifetime of a request until it's kicked
out, so that the shrinker doesn't it the request object too early.
How does holding an additional ref on the request prevent this? I mean I
don't see how that would prevent special casing in the eviction code ...
Or is your plan to put the final waiting for the request the ctx to idle
into the ->vma_unbind hook? That seems again a bit a quirk ...
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the Intel-gfx
mailing list