[Intel-gfx] [PATCH v2] drm/i915: add schedule out notification of preempted but completed request

Mon Mar 5 11:18:22 UTC 2018

Quoting Tvrtko Ursulin (2018-03-05 11:06:23)
> 
> On 24/02/2018 02:59, Weinan Li wrote:
> > There is one corner case missing schedule out notification of the preempted
> > request. The preempted request is just completed when preemption happen,
> > then it will be canceled and won't be resubmitted later, GVT-g will lost
> > the schedule out notification.
> > 
> > Here add schedule out notification if found the preempted request has been
> > completed.
> > 
> > v2:
> > - refine description, add completed check and notification in
> >    execlists_cancel_port_requests. (Chris)
> > 
> > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > Signed-off-by: Weinan Li <weinan.z.li at intel.com>
> > Signed-off-by: Zhenyu Wang <zhenyuw at linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_lrc.c | 8 +++++++-
> >   1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index e781c91..24a6e68 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -657,10 +657,16 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >   
> >       while (num_ports-- && port_isset(port)) {
> >               struct i915_request *rq = port_request(port);
> > +             unsigned int notify;
> >   
> >               GEM_BUG_ON(!execlists->active);
> >               intel_engine_context_out(rq->engine);
> > -             execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_PREEMPTED);
> > +
> > +             notify = INTEL_CONTEXT_SCHEDULE_PREEMPTED;
> > +             if (i915_request_completed(rq))
> > +                     notify = INTEL_CONTEXT_SCHEDULE_OUT;
> > +             execlists_context_status_change(rq, notify);
> > +
> >               i915_request_put(rq);
> >   
> >               memset(port, 0, sizeof(*port));
> > 
> 
> I hope seqno in HWS cannot change between execlists_cancel_port_requests 
> and __unwind_incomplete_requests? Some sort of delay in memory 
> transaction vs the interrupt? No idea, could be talking nonsense.

Not nonsense, just the type of nightmare I have (irq_seqno_barrier ahem).
We have an assert following the completion event to try and detect this
type of error, which thankfully hasn't fired during normal usage (so far,
it has only tripped over our programming bugs). I fear iommu in this
regard as well as that introduces random latency around memory
transactions that we know breaks coherency wrt interrupts.

As far as this goes, we already rely on this being called under stable
HWS, or else preemption/reset-recovery is fubar.
-Chris