[Intel-gfx] [PATCH] drm/i915/execlists: Reset CSB pointers by mmio as well

Mon Nov 4 14:33:57 UTC 2019

Quoting Mika Kuoppala (2019-11-04 14:27:21)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > Sometimes Icelake forgets to reset the CSB pointers on a GPU reset,
> > leading to carry on updating the old tail of the buffer.
> >
> > <0>[  618.138490] i915_sel-5636    3d..1 673425465us : trace_ports: vecs0: submit { 14de2:504, 0:0 }
> > <0>[  618.138490] i915_sel-5636    3.... 673425493us : intel_engine_reset: vecs0 flags=100
> > <0>[  618.138490] i915_sel-5636    3.... 673425493us : execlists_reset_prepare: vecs0: depth<-0
> > <0>[  618.138490] i915_sel-5636    3.... 673425493us : intel_engine_stop_cs: vecs0
> > <0>[  618.138490] i915_sel-5636    3.... 673425523us : __intel_gt_reset: engine_mask=40
> > <0>[  618.138490] i915_sel-5636    3.... 673425568us : execlists_reset: vecs0
> > <0>[  618.138490] i915_sel-5636    3d..1 673425568us : process_csb: vecs0 cs-irq head=1, tail=2
> > <0>[  618.138490] i915_sel-5636    3d..1 673425568us : process_csb: vecs0 csb[2]: status=0x00000001:0x40000000
> > <0>[  618.138490] i915_sel-5636    3d..1 673425569us : trace_ports: vecs0: promote { 14de2:504*, 0:0 }
> > <0>[  618.138490] i915_sel-5636    3d..1 673425570us : __i915_request_reset: vecs0 rq=14de2:504, guilty? yes
> > <0>[  618.138490] i915_sel-5636    3d..1 673425571us : __execlists_reset: vecs0 replay {head:2de0, tail:2e48}
> > <0>[  618.138490] i915_sel-5636    3d..1 673425572us : __i915_request_unsubmit: vecs0 fence 14de2:504, current 503
> > <0>[  618.138490] i915_sel-5636    3.... 673435544us : intel_engine_cancel_stop_cs: vecs0
> > <0>[  618.138490] i915_sel-5636    3.... 673435544us : process_csb: vecs0 cs-irq head=11, tail=11
> > <0>[  618.138490] i915_sel-5636    3d..1 673435545us : __i915_request_submit: vecs0 fence 14de2:504, current 503
> > <0>[  618.138490] i915_sel-5636    3d..1 673435546us : __execlists_submission_tasklet: vecs0: queue_priority_hint:-2147483648, submit:yes
> > <0>[  618.138490] i915_sel-5636    3d..1 673435548us : trace_ports: vecs0: submit { 14de2:504*, 0:0 }
> > <0>[  618.138490] i915_sel-5636    3.... 673435549us : execlists_reset_finish: vecs0: depth->0
> > <0>[  618.138490] ksoftirq-21      2..s. 673435592us : process_csb: vecs0 cs-irq head=11, tail=3
> > <0>[  618.138490] ksoftirq-21      2..s. 673435593us : process_csb: vecs0 csb[0]: status=0x00000001:0x40000000
> > <0>[  618.138490] ksoftirq-21      2..s. 673435594us : trace_ports: vecs0: promote { 14de2:504*, 0:0 }
> > <0>[  618.138490] ksoftirq-21      2..s. 673435596us : process_csb: vecs0 csb[1]: status=0x00000018:0x40000040
> > <0>[  618.138490] ksoftirq-21      2..s. 673435597us : trace_ports: vecs0: completed { 14de2:504*, 0:0 }
> > <0>[  618.138490] ksoftirq-21      2..s. 673435612us : process_csb: process_csb:2188 GEM_BUG_ON(!i915_request_completed(*execlists->active) && !reset_in_progress(execlists))
> >
> > After the reset, we do another clflush before checking the CSB to be
> > sure we see whatever was left in the CSB prior to the reset. So it is
> > unlikely to be an incoherent view of the CSB, and more likely that
> > Icelake didn't reset its pointers.
> >
> > References: 582a6f90aa0d ("drm/i915/execlists: Add a paranoid flush of the CSB pointers upon reset")
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_lrc.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 8d79a965f341..b869a9c7b3c6 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -2996,6 +2996,14 @@ static void reset_csb_pointers(struct intel_engine_cs *engine)
> >       WRITE_ONCE(*execlists->csb_write, reset_value);
> >       wmb(); /* Make sure this is visible to HW (paranoia?) */
> >  
> > +     /*
> > +      * Sometimes Icelakes forgets to reset its pointers on a GPU reset.
> > +      * Bludgeon them with a mmio update to be sure.
> > +      */
> > +     ENGINE_WRITE(engine, RING_CONTEXT_STATUS_PTR,
> > +                  reset_value << 8 | reset_value);
> > +     ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR);
> > +
> 
> Big hammer indeed. Next one is that we assert the reset value
> and check that first one after reset comes out with 0x0?

Yeah, I was thinking about that. Didn't seem trivial since it's not
always a single event. Plus then rare reset code is then spreading into
the hotpath :(
-Chris