[PATCH 08/22] drm/i915/guc: Don't enable scheduling on a banned context, guc_id invalid, not registered
Matthew Brost
matthew.brost at intel.com
Tue Aug 17 16:44:19 UTC 2021
On Tue, Aug 17, 2021 at 11:47:53AM +0200, Daniel Vetter wrote:
> On Mon, Aug 16, 2021 at 06:51:25AM -0700, Matthew Brost wrote:
> > When unblocking a context, do not enable scheduling if the context is
> > banned, guc_id invalid, or not registered.
> >
> > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
> > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > Cc: <stable at vger.kernel.org>
> > ---
> > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index c3b7bf7319dd..353899634fa8 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1579,6 +1579,9 @@ static void guc_context_unblock(struct intel_context *ce)
> > spin_lock_irqsave(&ce->guc_state.lock, flags);
> >
> > if (unlikely(submission_disabled(guc) ||
> > + intel_context_is_banned(ce) ||
> > + context_guc_id_invalid(ce) ||
> > + !lrc_desc_registered(guc, ce->guc_id) ||
> > !intel_context_is_pinned(ce) ||
> > context_pending_disable(ce) ||
> > context_blocked(ce) > 1)) {
>
> I think this entire if condition here is screaming that our intel_context
> state machinery for guc is way too complex, and on the wrong side of
> incomprehensible.
>
> Also some of these check state outside of the context, and we don't seem
> to hold spinlocks for those, or anything else.
>
> I general I have no idea which of these are defensive programming and
> cannot ever happen, and which actually can happen. There's for sure way
> too many races going on given that this is all context-local stuff.
A lot of this is guarding against a full GT reset while trying to
cancel a request. Full GT resets make everything really hard and in
pratice should never really happen because the GuC does per engine /
context resets. Unfortunately IGTs do weird things like turn off per
engine / contexts resets and full GT reset the only way to recover so
the IGTs can will expose all the races around GT reset, especially when
we run IGTs a pre-prod HW that tends to hang for whatever reason.
Matt
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
More information about the dri-devel
mailing list