[Intel-gfx] [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture

Chris Wilson chris at chris-wilson.co.uk
Wed Dec 6 14:48:36 UTC 2017


Quoting Daniel Vetter (2017-12-06 14:43:39)
> On Wed, Dec 06, 2017 at 02:19:03PM +0000, Chris Wilson wrote:
> > Since capturing the error state requires fiddling around with the GGTT
> > to read arbitrary buffers and is itself run under stop_machine(), it
> > deadlocks the machine (effectively a hard hang) when run in conjunction
> > with Broxton's VTd workaround to serialize GGTT access.
> > 
> > Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT")
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Jon Bloomfield <jon.bloomfield at intel.com>
> > Cc: John Harrison <john.C.Harrison at intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index 48418fb81066..e6c7e8e53815 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -1813,6 +1813,10 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv,
> >       if (!i915_modparams.error_capture)
> >               return;
> >  
> > +     /* Prevent recursively calling stop_machine() and deadlocking. */
> > +     if (intel_ggtt_update_needs_vtd_wa(dev_priv))
> > +             return;
> 
> I'd put this closer to the stop machine, at the head of
> i915_capture_gpu_state(). If the bogus debug output annoys then we could
> switch that to an PTR_ERR return value I guess. But I guess this here is
> ok too, so either way:

I was considering doing some of the capture, skipping the buffers, but
nowadays those buffers tend to the crux of triaging. My only real concern
is how to explain to the user that the error state cannot exist, for 
which we could go and add -ENODEV to sysfs/debugfs just to be clear.
-Chris


More information about the Intel-gfx mailing list