[Intel-gfx] [PATCH 2/3] drm/i915: Add control flags to i915_handle_error()
Chris Wilson
chris at chris-wilson.co.uk
Mon Mar 19 17:08:13 UTC 2018
Quoting Michel Thierry (2018-03-19 16:48:01)
> On 16/03/18 14:50, Chris Wilson wrote:
> > Not all callers want the GPU error to handled in the same way, so expose
> > a control parameter. In the first instance, some callers do not want the
> > heavyweight error capture so add a bit to request the state to be
> > captured and saved.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Jeff McGee <jeff.mcgee at intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_debugfs.c | 4 ++--
> > drivers/gpu/drm/i915/i915_drv.h | 4 +++-
> > drivers/gpu/drm/i915/i915_irq.c | 22 ++++++++++++++--------
> > drivers/gpu/drm/i915/intel_hangcheck.c | 6 +++---
> > drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 2 +-
> > 5 files changed, 23 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 5378863e3238..03b74a92caed 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -3952,8 +3952,8 @@ i915_wedged_set(void *data, u64 val)
> > engine->hangcheck.stalled = true;
> > }
> >
> > - i915_handle_error(i915, val, "Manually set wedged engine mask = %llx",
> > - val);
> > + i915_handle_error(i915, val, I915_ERROR_CAPTURE,
> > + "Manually set wedged engine mask = %llx", val);
> >
> > wait_on_bit(&i915->gpu_error.flags,
> > I915_RESET_HANDOFF,
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index e27ba8fb64e6..53009ba50640 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2751,10 +2751,12 @@ static inline void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
> > &dev_priv->gpu_error.hangcheck_work, delay);
> > }
> >
> > -__printf(3, 4)
> > +__printf(4, 5)
> > void i915_handle_error(struct drm_i915_private *dev_priv,
> > u32 engine_mask,
> > + unsigned long flags,
> > const char *fmt, ...);
> > +#define I915_ERROR_CAPTURE BIT(0)
> >
> Should this be in the new i915_gpu_error.h?
And move it over from i915_irq.c to somewhere more useful? Probably
intel_hangcheck.c since i915_gpu_error.c is conditionally compiled.
> > extern void intel_irq_init(struct drm_i915_private *dev_priv);
> > extern void intel_irq_fini(struct drm_i915_private *dev_priv);
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index 44eef355e12c..b3a4dc7cb26c 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -2955,6 +2955,7 @@ static void i915_clear_error_registers(struct drm_i915_private *dev_priv)
> > * i915_handle_error - handle a gpu error
> > * @dev_priv: i915 device private
> > * @engine_mask: mask representing engines that are hung
> > + * @flags: control flags
> > * @fmt: Error message format string
> > *
> > * Do some basic checking of register state at error time and
> > @@ -2965,16 +2966,11 @@ static void i915_clear_error_registers(struct drm_i915_private *dev_priv)
> > */
> > void i915_handle_error(struct drm_i915_private *dev_priv,
> > u32 engine_mask,
> > + unsigned long flags,
> > const char *fmt, ...)
> > {
> > struct intel_engine_cs *engine;
> > unsigned int tmp;
> > - va_list args;
> > - char error_msg[80];
> > -
> > - va_start(args, fmt);
> > - vscnprintf(error_msg, sizeof(error_msg), fmt, args);
> > - va_end(args);
> >
> > /*
> > * In most cases it's guaranteed that we get here with an RPM
> > @@ -2986,8 +2982,18 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
> > intel_runtime_pm_get(dev_priv);
> >
> > engine_mask &= INTEL_INFO(dev_priv)->ring_mask;
> > - i915_capture_error_state(dev_priv, engine_mask, error_msg);
> > - i915_clear_error_registers(dev_priv);
> > +
> > + if (flags & I915_ERROR_CAPTURE) {
> > + char error_msg[80];
> > + va_list args;
> > +
> > + va_start(args, fmt);
> > + vscnprintf(error_msg, sizeof(error_msg), fmt, args);
> > + va_end(args);
> > +
> > + i915_capture_error_state(dev_priv, engine_mask, error_msg);
> > + i915_clear_error_registers(dev_priv);
> > + }
> else
> DRM_INFO or DRM_NOTE the error_msg ?
>
> Otherwise the 'kicking wait/semaphore' text from below will never be
> printed.
I liked it disappearing ;)
So we should feed it to i915_reset(const char *reason). That could then
probably replace I915_RESET_QUIET.
-Chris
More information about the Intel-gfx
mailing list