[Intel-gfx] [PATCH] drm/i915: Wait for reset to complete before returning from debugfs/i915_wedged

Fri Mar 10 13:23:40 UTC 2017

On Fri, Mar 10, 2017 at 01:14:33PM +0000, Tvrtko Ursulin wrote:
> 
> On 10/03/2017 12:21, Chris Wilson wrote:
> >Provide some serialisation between user operations by waiting for the
> >reset initiated by setting i915_wedged to complete.
> >
> >Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> >---
> > drivers/gpu/drm/i915/i915_debugfs.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> >index 115433d46477..a1eccf2ef313 100644
> >--- a/drivers/gpu/drm/i915/i915_debugfs.c
> >+++ b/drivers/gpu/drm/i915/i915_debugfs.c
> >@@ -4138,6 +4138,10 @@ i915_wedged_set(void *data, u64 val)
> > 	i915_handle_error(dev_priv, val,
> > 			  "Manually setting wedged to %llu", val);
> >
> >+	wait_on_bit(&dev_priv->gpu_error.flags,
> >+		    I915_RESET_IN_PROGRESS,
> >+		    TASK_UNINTERRUPTIBLE);
> >+
> > 	return 0;
> > }
> 
> I've spotted that the kerneldoc for wait_on_bit says "One uses
> wait_on_bit() where one is waiting for the bit to clear, but has no
> intention of setting it."

That describes the above.

> I assume this is to avoid races, which it seems this new wait also
> doesn't avoid. Should it grab struct mutex across wait and
> handle_error? Or if not possible what is the benefit of the patch,
> just something to help IGT? Could we instead have IGT wait on the
> reset in progress status itself by exporting the status? (If we
> don't already, haven't looked.)

The primary purpose is so that the write doesn't return until the reset
it kicked (or joined) is complete. Since that is the intended
side-effect of writing into i915_wedged, it made sense to me.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre