[PATCH] drm/xe/oa: Disallow OA from being enabled on active exec_queue's

Thu Nov 21 21:04:24 UTC 2024

On Wed, 20 Nov 2024 21:16:44 -0800, Matthew Brost wrote:
>
> On Wed, Nov 20, 2024 at 08:22:05PM -0800, Dixit, Ashutosh wrote:
> > On Tue, 19 Nov 2024 14:09:07 -0800, Matthew Brost wrote:
> > >
> >
> > Hi Matt,
> >
> > > On Tue, Nov 19, 2024 at 01:08:49PM -0800, Dixit, Ashutosh wrote:
> > > > On Tue, 19 Nov 2024 06:44:51 -0800, Matthew Brost wrote:
> > > > >
> > > > > On Mon, Nov 18, 2024 at 05:32:56PM -0800, Ashutosh Dixit wrote:
> > > > > > Enabling OA on an exec_queue toggles the OAC_CONTEXT_ENABLE bit in
> > > > > > CTXT_SR_CTL register. Toggling this bit changes the size and layout of the
> > > > > > underlying HW context image. Therefore, enabling OA on an already active
> > > > > > exec_queue (as currently implemented in xe) is an invalid operation and can
> > > > > > cause hangs. Therefore, disallow OA from being enabled on active
> > > > > > exec_queue's (here, by active we mean a context on which submissions have
> > > > > > previously happened).
> > > > > >
> > > > >
> > > > > This is something we will need to keep on eye on then because in various
> > > > > experimental code I've played around enabling exec queues upon creation.
> > > > > e.g., If we want to allocate a doorbell. I seem to recall Habana wanting
> > > > > to enable exec queues upon creation too.
> > > >
> > > > The real requirement here is that HW context image should not have been
> > > > loaded before OA is enabled on the exec queue. That is what happens today
> > > > in the ENABLED state, correct, when user space submissions start?
> > > >
> > >
> > > Yea, that is the current flow. If we change this to enable an exec queue
> > > upon allocation we probably could check for ring head & tail == 0.
> > >
> > > Also note LR queues with preemption fences do toggle the enabled state
> > > too. Does OA apply to those queues? If so, a registered check may be
> > > better than enabled or maybe just start with ring head & tail == 0.
> >
> > Yes I could do 'ring head & tail == 0' but not sure how it works, since
> > ring head and tail could have wrapped round back to 0 (giving the false
> > impression that exec_queue was not active when it was).
>
> Indeed wrapping is a corner...
>
> What about ring head & tail == 0 && ctx timestamp == 0?
>
> That seems correct and future proof.

If 'head & tail == 0' is ambiguous, 'ctx timestamp == 0' should be
sufficient by itself. But appears ctx timestamp can also wrap around? If
yes, theoretically it is possible that all three (head, tail and ctx
timestamp) have simultaneously wrapped around to 0.

So I think that's why you want to check all three, since the likelihood of
all three simultaneously wrapping round to 0 is extremely unlikely (but I'd
say not impossible).

But anyway, I can go with this for now (and maybe revisit about perhaps
introducing a new state after ENABLED, after the other exec queue changes
are done). But this looks good for now.

Thanks.
--
Ashutosh

> > I am assuming you mean the ring is "0 to N" rather than "gtt_addr to
> > gtt_addr + N", correct?
> >
> > Thanks.
> > --
> > Ashutosh
> >
> > > > If operations such as doorbell are only management requests to GuC (which
> > > > don't cause HW context image to be loaded) and if we can name a new state
> > > > when the exec queue is handed off to userspace for starting submissions, we
> > > > should be able to stay with this approach.
> > > >
> > > > > Just curious if it was ever explored having exec queue creation
> > > > > extension which enables OA? It seems like this is something we may need
> > > > > at some point if our exec queue creation semantics change of course
> > > > > being careful to not break existing flows.
> > > >
> > > > Yeah I did think of it but didn't want to change the uapi.
> > > >
> > >
> > > Makes sense.
> > >
> > > > Also, a different implementation is possible which avoids this resizing of
> > > > the context image altogether. It requires the kernel OA code submit its
> > > > submissions on the user exec queue (and use that exec queue's VM, currently
> > > > OA code uses a kernel exec queue). There are some reasons I don't want to
> > > > implement that just yet, but worst case, we can do that if absolutely
> > > > needed.
> > > >
> > > > Thanks.
> > > > --
> > > > Ashutosh
> > > >
> > > > > > Transition from 1 -> 0 for this bit was disallowed in
> > > > > > '0c8650b09a36 ("drm/xe/oa: Don't reset OAC_CONTEXT_ENABLE on OA stream
> > > > > > close")'. Here we disallow the 0 -> 1 transition on active contexts.
> > > > > >
> > > > > > v2: Don't export exec_queue_enabled, define new xe_exec_queue_op (M Brost)
> > > > > >     Directly check OAC_CONTEXT_ENABLE bit from context image (J Cavitt)
> > > > > >
> > > > > > Bspec: 60314
> > > > > > Fixes: 2f4a730fcd2d ("drm/xe/oa: Add OAR support")
> > > > > > Cc: stable at vger.kernel.org
> > > > > > Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/xe/xe_exec_queue_types.h |  2 ++
> > > > > >  drivers/gpu/drm/xe/xe_guc_submit.c       |  1 +
> > > > > >  drivers/gpu/drm/xe/xe_oa.c               | 13 +++++++++++++
> > > > > >  3 files changed, 16 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > > index 1158b6062a6cd..b88d617c37b33 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > > @@ -184,6 +184,8 @@ struct xe_exec_queue_ops {
> > > > > >	void (*resume)(struct xe_exec_queue *q);
> > > > > >	/** @reset_status: check exec queue reset status */
> > > > > >	bool (*reset_status)(struct xe_exec_queue *q);
> > > > > > +	/** @enabled: check if exec queue is in enabled state */
> > > > > > +	bool (*enabled)(struct xe_exec_queue *q);
> > > > > >  };
> > > > > >
> > > > > >  #endif
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > > index f9ecee5364d82..b9b9cdb6f768b 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > > @@ -1660,6 +1660,7 @@ static const struct xe_exec_queue_ops guc_exec_queue_ops = {
> > > > > >	.suspend_wait = guc_exec_queue_suspend_wait,
> > > > > >	.resume = guc_exec_queue_resume,
> > > > > >	.reset_status = guc_exec_queue_reset_status,
> > > > > > +	.enabled = exec_queue_enabled,
> > > > > >  };
> > > > > >
> > > > > >  static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
> > > > > > index 8dd55798ab312..4a7440c40978c 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_oa.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_oa.c
> > > > > > @@ -2066,6 +2066,19 @@ int xe_oa_stream_open_ioctl(struct drm_device *dev, u64 data, struct drm_file *f
> > > > > >		if (XE_IOCTL_DBG(oa->xe, !param.exec_q))
> > > > > >			return -ENOENT;
> > > > > >
> > > > > > +		/*
> > > > > > +		 * Disallow OA from being enabled on active exec_queue's. Enabling OA sets the
> > > > > > +		 * OAC_CONTEXT_ENABLE bit in CTXT_SR_CTL register. Toggling the bit changes
> > > > > > +		 * the size and layout of the underlying HW context image and can cause hangs.
> > > > > > +		 */
> > > > > > +		if (XE_IOCTL_DBG(oa->xe,
> > > > > > +				 !(xe_lrc_read_ctx_reg(param.exec_q->lrc[0],
> > > > > > +				       CTX_CONTEXT_CONTROL) & CTX_CTRL_OAC_CONTEXT_ENABLE) &&
> > > > > > +				 param.exec_q->ops->enabled(param.exec_q))) {
> > > > > > +			ret = -EADDRINUSE;
> > > > > > +			goto err_exec_q;
> > > > > > +		}
> > > > > > +
> > > > > >		if (param.exec_q->width > 1)
> > > > > >			drm_dbg(&oa->xe->drm, "exec_q->width > 1, programming only exec_q->lrc[0]\n");
> > > > > >	}
> > > > > > --
> > > > > > 2.41.0
> > > > > >