[Intel-gfx] [PATCH 07/25] drm/i915: Cancel context if it hangs after it is closed

Chris Wilson chris at chris-wilson.co.uk
Mon Nov 11 11:04:33 UTC 2019


Quoting Mika Kuoppala (2019-11-11 10:54:14)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > If we detect a hang in a closed context, just flush all of its requests
> > and cancel any remaining execution along the context. Note that after
> > closing the context, the last reference to the context may be dropped,
> > leaving it only valid under RCU.
> 
> Sound good. But is there a window for userspace to start
> to see -EIO if it resubmits to a closed context?

Userspace can not submit to a closed context (-ENOENT) as that would be
tantamount to a use-after-free kernel bug.
 
> In other words, after userspace doing gem_ctx_destroy(ctx_handle),
> we would return -EINVAL due to ctx_handle being stale
> earlier than we check for banned status and return -EIO?

It's as simple as if the context is closed, it is removed from the
file->context_idr and userspace cannot access it. If userspace is racing
with itself, there's not much we can do other than protect our
references. If userspace succeeds in submitting to the context prior to
closing it in another thread, it has the context to continue (and if
then hangs, it will be shot down immediately). If it loses that race, it
gets an -ENOENT. If it loses that race so badly the context id is
replace by a new context, it submits to that new context; which surely
will end in tears and GPU hangs, but not our fault and nothing we can do
to prevent that.
-Chris


More information about the Intel-gfx mailing list