[Intel-gfx] [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles

Fri Oct 9 02:15:08 PDT 2015

On Fri, Oct 09, 2015 at 09:36:58AM +0100, Chris Wilson wrote:
> On Fri, Oct 09, 2015 at 09:58:51AM +0200, Daniel Vetter wrote:
> > > > +static bool i915_gem_request_retireable(struct drm_i915_gem_request *req)
> > > > +{
> > > > +	return (i915_gem_request_completed(req, true) &&
> > > > +		(!req->elsp_submitted || req->ctx_complete));
> > > 
> > > I disagree with this completely. A request must only be considered complete
> > > when it's seqno is passed. The context should be tracking the request
> > > and not the other way around (like legacy does).
> > > 
> > > The approach is just wrong. The lifecycle of the request is not the
> > > bloat in execlists, and keeping an extra reference on the request struct
> > > itself them until execlists is able to reap them amoritizes the cost of
> > > the spinlock and atomic operations.
> > 
> > Hm, that's slightly different approach than what I suggested with just
> > creating special ctx-switch requests (since with the scheduler we'll
> > reorder so can't do the same trick as with legacy ctx switching any more).
> > This trick here otoh extends the lifetime of a request until it's kicked
> > out, so that the shrinker doesn't it the request object too early.
> 
> Note what you are considering here is how to use requests to manage
> contexts. execlists is different enough from legacy in how contexts are
> only active for as long as the request is, that we don't need anything
> more than the current requests to manage their active cycles + lifetimes.
> 
> > How does holding an additional ref on the request prevent this? I mean I
> > don't see how that would prevent special casing in the eviction code ...
> > Or is your plan to put the final waiting for the request the ctx to idle
> > into the ->vma_unbind hook? That seems again a bit a quirk ...
> 
> The reference is simply that the submission is asynchronous and not
> under the purview of GEM, i.e. execlists has a list of requests with
> which to submit and so needs a reference - especially as the ordering
> between request completion and the interrupt is not guaranteed, and the
> requests can be completed asynchronously. The reference is not to
> prevent retiring, but simply to keep the request struct alive until
> removed from the execlists.  Similarly for anything more exotic than
> the FIFO scheduler.

Hm yeah I thought we've had that already ... at least there's an
i915_gem_request_unreference in intel_execlists_retire_requests, separate
from the core request retiring.

> The basic idea I have here is simply that requests are complete when the
> GPU finishes executing them, that is when we retire them from the GEM lists.
> I want to keep it that simple as then we can build upon requests for
> tracking GPU activity at an abstract GEM level. So context lifecycles
> depend on requests and not the other way around (which is how we do it
> for legacy, pin until switch (whilst the hw references the ctx) then active
> until the switch away is complete.)) I think underpining this is we do
> have to keep the idea of GPU activity (requests) and object life cycles
> as two seperate but interelated ideas/mechanisms.

Yup agreed on that concept. The difference is in how exactly we track when
the context is no longer in use by the hardware. With execlist we have

1. MI_BB_START
2. seqno write/irq, which completes the requests that we currently have.
3. hw/guc retires the hw context and signals that with another, dedicated
irq (CTX_SWITCH_IRQ).

Nick's approach is to dealy the request completion with ctx_completed from
2 to 3, but open-coded in places (which breaks the shrinker).

You seem to suggest something similar, but I'm not entirely clear.

My idea was to create a new request for 3. which gets signalled by the
scheduler in intel_lrc_irq_handler. My idea was that we'd only create
these when a ctx switch might occur to avoid overhead, but I guess if we
just outright delay all requests a notch if need that might work too. But
I'm really not sure on the implications of that (i.e. does the hardware
really unlod the ctx if it's idle?), and whether that would fly still with
the scheduler.

But figuring this one out here seems to be the cornestone of this reorg.
Without it we can't just throw contexts onto the active list.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch