[Intel-gfx] [PATCH] drm/i915: Fix up -EIO ABI per igt/gem_eio

Chris Wilson chris at chris-wilson.co.uk
Wed Nov 25 09:52:57 PST 2015


On Wed, Nov 25, 2015 at 06:45:26PM +0100, Daniel Vetter wrote:
> My apologies to Chris Wilson for being dense on this topic for
> essentially for years.
> 
> This patch doesn't do any big redesign of the overall reset flow, but
> instead just applies changes where it's needed to obey gem_eio. We can
> redesign later on, but for now I prefer to make the testcase happy
> with some minimally invasive changes:
> 
> - Ensure that set_domain_ioctl never returns -EIO. Tricky part there
>   is that we might race with the reset handler when doing the lockless
>   wait.
> 
> - Make sure debugfs/i915_wedged is actually useful to reset a wedged
>   gpu - atm it bails out with -EAGAIN for a terminally wedged gpu.
>   That's because reset_in_progress actually includes that terminal
>   state, which is super-confusing (and most callers got this wrong).
>   Fix this by changing the semantics of reset_in_progress to do what
>   it says on the tin (and fixup all other callers).
> 
>   Also make sure that reset_in_progress is cleared when we reach the
>   terminal "wedged" state. Without this we kept returning -EAGAIN in
>   some places forever.
> 
> - Second problem with debugfs/i915_wedge was that we never got out of
>   the terminal wedged state. Hence manually set the reset counter out
>   of that state before starting the hung gpu recovery.
> 
>   For safety also make sure that we are consintent with resetting the
>   gpu between i915_reset_and_wakeup and i915_handler_error by just
>   passing the boolean "wedged" around instead of trying to recompute
>   it wrongly. This isn't an issue for gem_eio with the debugfs fix,
>   but just general robustness of the code.
> 
> - Finally make sure that when gpu reset is disabled through the module
>   paramter the kernel doesn't generate dmesg noise that would upset
>   our CI/piglit. Simplest solution for that was to lift the i915.reset
>   check up into i915_reset().
> 
> With all these changes, plus the igt fixes I've posted, gem_eio is now
> happy on many snb.
> 
> v2: Don't upset lockdep with my set_domain_ioctl changes.
> 
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Testcase: igt/gem_eio/*
> Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |  4 ++++
>  drivers/gpu/drm/i915/i915_drv.c     |  6 ++++++
>  drivers/gpu/drm/i915/i915_drv.h     |  2 +-
>  drivers/gpu/drm/i915/i915_gem.c     | 23 +++++++++++++++--------
>  drivers/gpu/drm/i915/i915_irq.c     | 11 ++++++-----
>  drivers/gpu/drm/i915/intel_uncore.c |  3 ---
>  6 files changed, 32 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 411a9c68b4ee..c4006c09ef1b 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -4744,6 +4744,10 @@ i915_wedged_set(void *data, u64 val)
>  	if (i915_reset_in_progress(&dev_priv->gpu_error))
>  		return -EAGAIN;
>  
> +	/* Already wedged, force us out of this terminal state. */
> +	if (i915_terminally_wedged(&dev_priv->gpu_error))
> +		atomic_set(&dev_priv->gpu_error.reset_counter, 0);

What?

> +
>  	intel_runtime_pm_get(dev_priv);
>  
>  	i915_handle_error(dev, val,
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index ec1e877c4a36..1f5386bb78f4 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -909,6 +909,12 @@ int i915_reset(struct drm_device *dev)
>  
>  	simulated = dev_priv->gpu_error.stop_rings != 0;
>  
> +	if (!i915.reset) {
> +		DRM_INFO("GPU reset disabled in module option, not resetting\n");
> +		mutex_unlock(&dev->struct_mutex);
> +		return -ENODEV;
> +	}
> +
>  	ret = intel_gpu_reset(dev);

Please don't break intel_gpu_reset() like that.

I think this patch missed the point entirely.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list