[Intel-gfx] [PATCH] drm/i915: Fix up -EIO ABI per igt/gem_eio
Chris Wilson
chris at chris-wilson.co.uk
Wed Nov 25 09:52:57 PST 2015
On Wed, Nov 25, 2015 at 06:45:26PM +0100, Daniel Vetter wrote:
> My apologies to Chris Wilson for being dense on this topic for
> essentially for years.
>
> This patch doesn't do any big redesign of the overall reset flow, but
> instead just applies changes where it's needed to obey gem_eio. We can
> redesign later on, but for now I prefer to make the testcase happy
> with some minimally invasive changes:
>
> - Ensure that set_domain_ioctl never returns -EIO. Tricky part there
> is that we might race with the reset handler when doing the lockless
> wait.
>
> - Make sure debugfs/i915_wedged is actually useful to reset a wedged
> gpu - atm it bails out with -EAGAIN for a terminally wedged gpu.
> That's because reset_in_progress actually includes that terminal
> state, which is super-confusing (and most callers got this wrong).
> Fix this by changing the semantics of reset_in_progress to do what
> it says on the tin (and fixup all other callers).
>
> Also make sure that reset_in_progress is cleared when we reach the
> terminal "wedged" state. Without this we kept returning -EAGAIN in
> some places forever.
>
> - Second problem with debugfs/i915_wedge was that we never got out of
> the terminal wedged state. Hence manually set the reset counter out
> of that state before starting the hung gpu recovery.
>
> For safety also make sure that we are consintent with resetting the
> gpu between i915_reset_and_wakeup and i915_handler_error by just
> passing the boolean "wedged" around instead of trying to recompute
> it wrongly. This isn't an issue for gem_eio with the debugfs fix,
> but just general robustness of the code.
>
> - Finally make sure that when gpu reset is disabled through the module
> paramter the kernel doesn't generate dmesg noise that would upset
> our CI/piglit. Simplest solution for that was to lift the i915.reset
> check up into i915_reset().
>
> With all these changes, plus the igt fixes I've posted, gem_eio is now
> happy on many snb.
>
> v2: Don't upset lockdep with my set_domain_ioctl changes.
>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Testcase: igt/gem_eio/*
> Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
> ---
> drivers/gpu/drm/i915/i915_debugfs.c | 4 ++++
> drivers/gpu/drm/i915/i915_drv.c | 6 ++++++
> drivers/gpu/drm/i915/i915_drv.h | 2 +-
> drivers/gpu/drm/i915/i915_gem.c | 23 +++++++++++++++--------
> drivers/gpu/drm/i915/i915_irq.c | 11 ++++++-----
> drivers/gpu/drm/i915/intel_uncore.c | 3 ---
> 6 files changed, 32 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 411a9c68b4ee..c4006c09ef1b 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -4744,6 +4744,10 @@ i915_wedged_set(void *data, u64 val)
> if (i915_reset_in_progress(&dev_priv->gpu_error))
> return -EAGAIN;
>
> + /* Already wedged, force us out of this terminal state. */
> + if (i915_terminally_wedged(&dev_priv->gpu_error))
> + atomic_set(&dev_priv->gpu_error.reset_counter, 0);
What?
> +
> intel_runtime_pm_get(dev_priv);
>
> i915_handle_error(dev, val,
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index ec1e877c4a36..1f5386bb78f4 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -909,6 +909,12 @@ int i915_reset(struct drm_device *dev)
>
> simulated = dev_priv->gpu_error.stop_rings != 0;
>
> + if (!i915.reset) {
> + DRM_INFO("GPU reset disabled in module option, not resetting\n");
> + mutex_unlock(&dev->struct_mutex);
> + return -ENODEV;
> + }
> +
> ret = intel_gpu_reset(dev);
Please don't break intel_gpu_reset() like that.
I think this patch missed the point entirely.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list