[PATCH] drm/xe/oa: Disallow OA from being enabled on active exec_queue's
Dixit, Ashutosh
ashutosh.dixit at intel.com
Tue Nov 19 21:08:49 UTC 2024
On Tue, 19 Nov 2024 06:44:51 -0800, Matthew Brost wrote:
>
> On Mon, Nov 18, 2024 at 05:32:56PM -0800, Ashutosh Dixit wrote:
> > Enabling OA on an exec_queue toggles the OAC_CONTEXT_ENABLE bit in
> > CTXT_SR_CTL register. Toggling this bit changes the size and layout of the
> > underlying HW context image. Therefore, enabling OA on an already active
> > exec_queue (as currently implemented in xe) is an invalid operation and can
> > cause hangs. Therefore, disallow OA from being enabled on active
> > exec_queue's (here, by active we mean a context on which submissions have
> > previously happened).
> >
>
> This is something we will need to keep on eye on then because in various
> experimental code I've played around enabling exec queues upon creation.
> e.g., If we want to allocate a doorbell. I seem to recall Habana wanting
> to enable exec queues upon creation too.
The real requirement here is that HW context image should not have been
loaded before OA is enabled on the exec queue. That is what happens today
in the ENABLED state, correct, when user space submissions start?
If operations such as doorbell are only management requests to GuC (which
don't cause HW context image to be loaded) and if we can name a new state
when the exec queue is handed off to userspace for starting submissions, we
should be able to stay with this approach.
> Just curious if it was ever explored having exec queue creation
> extension which enables OA? It seems like this is something we may need
> at some point if our exec queue creation semantics change of course
> being careful to not break existing flows.
Yeah I did think of it but didn't want to change the uapi.
Also, a different implementation is possible which avoids this resizing of
the context image altogether. It requires the kernel OA code submit its
submissions on the user exec queue (and use that exec queue's VM, currently
OA code uses a kernel exec queue). There are some reasons I don't want to
implement that just yet, but worst case, we can do that if absolutely
needed.
Thanks.
--
Ashutosh
> > Transition from 1 -> 0 for this bit was disallowed in
> > '0c8650b09a36 ("drm/xe/oa: Don't reset OAC_CONTEXT_ENABLE on OA stream
> > close")'. Here we disallow the 0 -> 1 transition on active contexts.
> >
> > v2: Don't export exec_queue_enabled, define new xe_exec_queue_op (M Brost)
> > Directly check OAC_CONTEXT_ENABLE bit from context image (J Cavitt)
> >
> > Bspec: 60314
> > Fixes: 2f4a730fcd2d ("drm/xe/oa: Add OAR support")
> > Cc: stable at vger.kernel.org
> > Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 ++
> > drivers/gpu/drm/xe/xe_guc_submit.c | 1 +
> > drivers/gpu/drm/xe/xe_oa.c | 13 +++++++++++++
> > 3 files changed, 16 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > index 1158b6062a6cd..b88d617c37b33 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > @@ -184,6 +184,8 @@ struct xe_exec_queue_ops {
> > void (*resume)(struct xe_exec_queue *q);
> > /** @reset_status: check exec queue reset status */
> > bool (*reset_status)(struct xe_exec_queue *q);
> > + /** @enabled: check if exec queue is in enabled state */
> > + bool (*enabled)(struct xe_exec_queue *q);
> > };
> >
> > #endif
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index f9ecee5364d82..b9b9cdb6f768b 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -1660,6 +1660,7 @@ static const struct xe_exec_queue_ops guc_exec_queue_ops = {
> > .suspend_wait = guc_exec_queue_suspend_wait,
> > .resume = guc_exec_queue_resume,
> > .reset_status = guc_exec_queue_reset_status,
> > + .enabled = exec_queue_enabled,
> > };
> >
> > static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
> > diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
> > index 8dd55798ab312..4a7440c40978c 100644
> > --- a/drivers/gpu/drm/xe/xe_oa.c
> > +++ b/drivers/gpu/drm/xe/xe_oa.c
> > @@ -2066,6 +2066,19 @@ int xe_oa_stream_open_ioctl(struct drm_device *dev, u64 data, struct drm_file *f
> > if (XE_IOCTL_DBG(oa->xe, !param.exec_q))
> > return -ENOENT;
> >
> > + /*
> > + * Disallow OA from being enabled on active exec_queue's. Enabling OA sets the
> > + * OAC_CONTEXT_ENABLE bit in CTXT_SR_CTL register. Toggling the bit changes
> > + * the size and layout of the underlying HW context image and can cause hangs.
> > + */
> > + if (XE_IOCTL_DBG(oa->xe,
> > + !(xe_lrc_read_ctx_reg(param.exec_q->lrc[0],
> > + CTX_CONTEXT_CONTROL) & CTX_CTRL_OAC_CONTEXT_ENABLE) &&
> > + param.exec_q->ops->enabled(param.exec_q))) {
> > + ret = -EADDRINUSE;
> > + goto err_exec_q;
> > + }
> > +
> > if (param.exec_q->width > 1)
> > drm_dbg(&oa->xe->drm, "exec_q->width > 1, programming only exec_q->lrc[0]\n");
> > }
> > --
> > 2.41.0
> >
More information about the Intel-xe
mailing list