[Intel-gfx] [PATCH 2/2] drm/i915: Set guilty-flag on fence after detecting a hang

Chris Wilson chris at chris-wilson.co.uk
Tue Jan 3 11:46:12 UTC 2017


On Tue, Jan 03, 2017 at 11:34:45AM +0000, Tvrtko Ursulin wrote:
> 
> On 03/01/2017 11:05, Chris Wilson wrote:
> >The struct dma_fence carries a status field exposed to userspace by
> >sync_file. This is inspected after the fence is signaled and can convey
> >whether or not the request completed successfully, or in our case if we
> >detected a hang during the request (signaled via -EIO in
> >SYNC_IOC_FILE_INFO).
> >
> >Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> >---
> > drivers/gpu/drm/i915/i915_gem.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index 204c4a673bf3..bc99c0e292d8 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -2757,10 +2757,12 @@ static void i915_gem_reset_engine(struct intel_engine_cs *engine)
> > 		ring_hung = false;
> > 	}
> >
> >-	if (ring_hung)
> >+	if (ring_hung) {
> > 		i915_gem_context_mark_guilty(request->ctx);
> >-	else
> >+		request->fence.status = -EIO;
> >+	} else {
> > 		i915_gem_context_mark_innocent(request->ctx);
> >+	}
> >
> > 	if (!ring_hung)
> > 		return;
> >
> 
> Reading what happens later in this function, should we set the
> status of all the other requests we are about to clear?
> 
> However one thing I don't understand is how this scheme interacts
> with the current userspace. We will clear/no-nop some of the
> submitted requests since the state is corrupt. But how will
> userspace notice this before it submits more requets?

There is no mechanism currently for user space to be able to detect a
hung request. (It can use the uevent for async notification of the
hang/reset, but that will not tell you who caused the hang.) Userspace
can track the number of hangs it caused, but the delay makes any
roundtripping impractical (i.e. you have to synchronise before all
rendering if you must detect the event immediately). Note also that we
do not want to give out interprocess information (i.e. to allow one
client to spy on another), which makes things harder to get right.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the dri-devel mailing list