[Intel-gfx] [PATCH] drm/i915/hsw: Flush RING_IMR changes before changing the global GT IMR (vecs)

Mon Jan 7 11:32:04 UTC 2019

Quoting Mika Kuoppala (2019-01-07 11:21:32)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > Haswell also requires the RING_IMR flush for its unique vebox setup to
> > avoid losing interrupts, as per 476af9c26063 ("drm/i915/gen6: Flush
> > RING_IMR changes before changing the global GT IMR"):
> >
> > On Baytail, notably, we can still detect missed interrupt syndrome
> > (where we never spot a completed request). In this case, it can be
> > alleviated by always keeping the interrupt unmasked, implying that the
> > interrupt is being lost in the window after modifying the IMR. (This is
> > the reason we still have the posting reads on enable_irq, if we remove
> > them we miss interrupts!) Having narrowed the issue down to the IMR,
> > rather than keeping it always enabled, applying the usual posting
> > read/flush of the RING_IMR before unmasking the GT IMR also seems to
> > prevent the missed interrupt. So be it.
> >
> > References: 476af9c26063 ("drm/i915/gen6: Flush RING_IMR changes before changing the global GT IMR")
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> 
> Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>

Ta. This appears to have been the last missing link for seqno/interrupt
stability.

Over the w/e, I found a machine that reproduced the issue and confirmed
that with the current gen7_xcs w/a it is very stable (no failures
noted), but with just gen6_xcs it would detect a missed breadcrumb
within 17 minutes.

So sadly, I'll have to drop the remove gen7_xcs patch for now.
Hopefully, some one else can solve the issue (I think it's linked to the
simultaneous arrival of MI_USER_INTERRUPT from more than one engine...)
-Chris