[Intel-gfx] [PATCH v2] drm/i915: Taint (TAINT_DIE) the kernel if the GPU reset fails

Joonas Lahtinen joonas.lahtinen at linux.intel.com
Mon Dec 4 13:41:11 UTC 2017


On Wed, 2017-11-29 at 14:05 +0000, Chris Wilson wrote:
> History tells us that if we cannot reset the GPU now, we never will. This
> then impacts everything that is run subsequently. On failing the reset,
> we mark the driver as wedged, trying to prevent further execution on the
> GPU, forcing userspace to fallback to using the CPU to update its
> framebuffers and let the user know what happened.
> 
> We also want to go one step further and add a taint to the kernel so that
> any subsequent faults can be traced back to this failure. This is
> important for igt, where if the GPU/driver fails we want to reboot and
> restart testing rather than continue on into oblivion.
> 
> TAINT_DIE is colloquially known as "system on fire", which seems
> appropriate for unresponsive hardware.
> 
> v2: Also taint if the recovery fails (again history shows us that is
> typically fatal).
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=103514
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> Cc: Michał Winiarski <michal.winiarski at intel.com>

<SNIP>

> @@ -1951,6 +1954,19 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
>  	wake_up_bit(&error->flags, I915_RESET_HANDOFF);
>  	return;
>  
> +taint:
> +	/*
> +	 * History tells us that if we cannot reset the GPU now, we
> +	 * never will. This then impacts everything that is run
> +	 * subsequently. On failing the reset, we mark the driver
> +	 * as wedged, preventing further execution on the GPU.
> +	 * We also want to go one step further and add a taint to the
> +	 * kernel so that any subsequent faults can be traced back to
> +	 * this failure. This is important for igt, where if the
> +	 * GPU/driver fails we want to reboot and restart testing
> +	 * rather than continue on into oblivion.
> +	 */

As Marta mentioned too, How igt works on a given day is bit volatile to
document in the kernel comments.

With that dropped;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation


More information about the Intel-gfx mailing list