[Intel-gfx] ✗ Fi.CI.IGT: failure for series starting with [CI,1/4] drm/i915/guc: Tidy guc_log_control
Chris Wilson
chris at chris-wilson.co.uk
Sat Mar 10 11:24:17 UTC 2018
Quoting Michał Winiarski (2018-03-10 11:07:03)
> [ 59.708020] [drm:error_state_write [i915]] Resetting error state
> [ 59.708508] [IGT] gem_exec_capture: starting subtest capture-vebox
> [ 59.718849] [drm] GPU HANG: ecode 9:0:0xfff7fffe, reason: Manually set
> wedged engine mask = ffffffffffffffff, action: reset
> [ 59.719421] i915 0000:00:02.0: Resetting vecs0 after gpu hang
> [ 59.720276] [drm:i915_gem_reset_engine [i915]] resetting vecs0 to restart
> from tail of request 0x1
> [ 59.721008] [drm:i915_reset_device [i915]] resetting chip
> [ 59.721226] i915 0000:00:02.0: Resetting chip after gpu hang
> [ 59.721575] i915 0000:00:02.0: GPU recovery failed
Full device reset doesn't handle being called from a failed per-engine
reset. Whoops. It doesn't look there's any reason for it to have failed
per-engine reset either,
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 828f3104488c..44eef355e12c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2985,6 +2985,7 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
*/
intel_runtime_pm_get(dev_priv);
+ engine_mask &= INTEL_INFO(dev_priv)->ring_mask;
i915_capture_error_state(dev_priv, engine_mask, error_msg);
i915_clear_error_registers(dev_priv);
should fix the immediate problem; but there's no reason afaict for this
to vary between test runs. As to how to properly ignore left-over state
from per-engine reset when doing the full-reset fallback... ugh.
-Chris
More information about the Intel-gfx
mailing list