[Intel-gfx] [PATCH] drm/i915/execlists: Reset CSB pointers by mmio as well

Mika Kuoppala mika.kuoppala at linux.intel.com
Mon Nov 4 14:27:21 UTC 2019


Chris Wilson <chris at chris-wilson.co.uk> writes:

> Sometimes Icelake forgets to reset the CSB pointers on a GPU reset,
> leading to carry on updating the old tail of the buffer.
>
> <0>[  618.138490] i915_sel-5636    3d..1 673425465us : trace_ports: vecs0: submit { 14de2:504, 0:0 }
> <0>[  618.138490] i915_sel-5636    3.... 673425493us : intel_engine_reset: vecs0 flags=100
> <0>[  618.138490] i915_sel-5636    3.... 673425493us : execlists_reset_prepare: vecs0: depth<-0
> <0>[  618.138490] i915_sel-5636    3.... 673425493us : intel_engine_stop_cs: vecs0
> <0>[  618.138490] i915_sel-5636    3.... 673425523us : __intel_gt_reset: engine_mask=40
> <0>[  618.138490] i915_sel-5636    3.... 673425568us : execlists_reset: vecs0
> <0>[  618.138490] i915_sel-5636    3d..1 673425568us : process_csb: vecs0 cs-irq head=1, tail=2
> <0>[  618.138490] i915_sel-5636    3d..1 673425568us : process_csb: vecs0 csb[2]: status=0x00000001:0x40000000
> <0>[  618.138490] i915_sel-5636    3d..1 673425569us : trace_ports: vecs0: promote { 14de2:504*, 0:0 }
> <0>[  618.138490] i915_sel-5636    3d..1 673425570us : __i915_request_reset: vecs0 rq=14de2:504, guilty? yes
> <0>[  618.138490] i915_sel-5636    3d..1 673425571us : __execlists_reset: vecs0 replay {head:2de0, tail:2e48}
> <0>[  618.138490] i915_sel-5636    3d..1 673425572us : __i915_request_unsubmit: vecs0 fence 14de2:504, current 503
> <0>[  618.138490] i915_sel-5636    3.... 673435544us : intel_engine_cancel_stop_cs: vecs0
> <0>[  618.138490] i915_sel-5636    3.... 673435544us : process_csb: vecs0 cs-irq head=11, tail=11
> <0>[  618.138490] i915_sel-5636    3d..1 673435545us : __i915_request_submit: vecs0 fence 14de2:504, current 503
> <0>[  618.138490] i915_sel-5636    3d..1 673435546us : __execlists_submission_tasklet: vecs0: queue_priority_hint:-2147483648, submit:yes
> <0>[  618.138490] i915_sel-5636    3d..1 673435548us : trace_ports: vecs0: submit { 14de2:504*, 0:0 }
> <0>[  618.138490] i915_sel-5636    3.... 673435549us : execlists_reset_finish: vecs0: depth->0
> <0>[  618.138490] ksoftirq-21      2..s. 673435592us : process_csb: vecs0 cs-irq head=11, tail=3
> <0>[  618.138490] ksoftirq-21      2..s. 673435593us : process_csb: vecs0 csb[0]: status=0x00000001:0x40000000
> <0>[  618.138490] ksoftirq-21      2..s. 673435594us : trace_ports: vecs0: promote { 14de2:504*, 0:0 }
> <0>[  618.138490] ksoftirq-21      2..s. 673435596us : process_csb: vecs0 csb[1]: status=0x00000018:0x40000040
> <0>[  618.138490] ksoftirq-21      2..s. 673435597us : trace_ports: vecs0: completed { 14de2:504*, 0:0 }
> <0>[  618.138490] ksoftirq-21      2..s. 673435612us : process_csb: process_csb:2188 GEM_BUG_ON(!i915_request_completed(*execlists->active) && !reset_in_progress(execlists))
>
> After the reset, we do another clflush before checking the CSB to be
> sure we see whatever was left in the CSB prior to the reset. So it is
> unlikely to be an incoherent view of the CSB, and more likely that
> Icelake didn't reset its pointers.
>
> References: 582a6f90aa0d ("drm/i915/execlists: Add a paranoid flush of the CSB pointers upon reset")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 8d79a965f341..b869a9c7b3c6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2996,6 +2996,14 @@ static void reset_csb_pointers(struct intel_engine_cs *engine)
>  	WRITE_ONCE(*execlists->csb_write, reset_value);
>  	wmb(); /* Make sure this is visible to HW (paranoia?) */
>  
> +	/*
> +	 * Sometimes Icelakes forgets to reset its pointers on a GPU reset.
> +	 * Bludgeon them with a mmio update to be sure.
> +	 */
> +	ENGINE_WRITE(engine, RING_CONTEXT_STATUS_PTR,
> +		     reset_value << 8 | reset_value);
> +	ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR);
> +

Big hammer indeed. Next one is that we assert the reset value
and check that first one after reset comes out with 0x0?

Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>

>  	invalidate_csb_entries(&execlists->csb_status[0],
>  			       &execlists->csb_status[reset_value]);
>  }
> -- 
> 2.24.0.rc2


More information about the Intel-gfx mailing list