[Intel-gfx] [PATCH 1/5] drm/i915: Add control flags to i915_handle_error()

Tue Mar 20 01:09:06 UTC 2018

Quoting Michel Thierry (2018-03-20 00:56:04)
> On 3/19/2018 5:44 PM, Chris Wilson wrote:
> > Quoting Michel Thierry (2018-03-20 00:39:35)
> >> On 3/19/2018 5:18 PM, Chris Wilson wrote:
> >>> Not all callers want the GPU error to handled in the same way, so expose
> >>> a control parameter. In the first instance, some callers do not want the
> >>> heavyweight error capture so add a bit to request the state to be
> >>> captured and saved.
> >>>
> >>> v2: Pass msg down to i915_reset/i915_reset_engine so that we include the
> >>> reason for the reset in the dev_notice(), superseding the earlier option
> >>> to not print that notice.
> >>>
> >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> Cc: Jeff McGee <jeff.mcgee at intel.com>
> >>> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> >>> Cc: Michel Thierry <michel.thierry at intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/i915_debugfs.c              |  4 +--
> >>>    drivers/gpu/drm/i915/i915_drv.c                  | 17 +++++------
> >>>    drivers/gpu/drm/i915/i915_drv.h                  | 10 +++---
> >>>    drivers/gpu/drm/i915/i915_irq.c                  | 39 +++++++++++++-----------
> >>>    drivers/gpu/drm/i915/intel_hangcheck.c           | 13 ++++----
> >>>    drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 13 +++-----
> >>>    6 files changed, 48 insertions(+), 48 deletions(-)
> >>>
> >> ...
> >>> diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> >>> index 42e45ae87393..fd0ffb8328d0 100644
> >>> --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> >>> +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> >>> @@ -246,9 +246,8 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
> >>>         */
> >>>        tmp = I915_READ_CTL(engine);
> >>>        if (tmp & RING_WAIT) {
> >>> -             i915_handle_error(dev_priv, 0,
> >>> -                               "Kicking stuck wait on %s",
> >>> -                               engine->name);
> >>> +             i915_handle_error(dev_priv, BIT(engine->id), 0,
> >>> +                               "stuck wait on %s", engine->name);
> >> Before we were not resetting anything here, is this change on purpose?
> >> (if it is, it's worth adding it to the commit msg since it's changing
> >> behavior).
> >>
> >>>                I915_WRITE_CTL(engine, tmp);
> >>>                return ENGINE_WAIT_KICK;
> >>>        } > @@ -258,8 +257,8 @@ engine_stuck(struct intel_engine_cs *engine, u64
> >> acthd)
> >>>                default:
> >>>                        return ENGINE_DEAD;
> >>>                case 1:
> >>> -                     i915_handle_error(dev_priv, 0,
> >>> -                                       "Kicking stuck semaphore on %s",
> >>> +                     i915_handle_error(dev_priv, ALL_ENGINES, 0,
> >> Same here,
> > 
> > Both are functionally no-op changes, as they are only for !per-engine
> > platforms (unless someone manages to send just the wrong type of garbage
> > to the GPU). I just thought it interesting to document that wait-event
> > needs a local kick and the wait-sema needs to kick the other engines.
> i915_handle_error has this before full reset:
> 
>         if (!engine_mask)
>                 goto out;
> 
> No reset at all was happening before.

We bugged out a while back then ;)
-Chris