[PATCH] drm/xe: Drop EXEC_QUEUE_FLAG_BANNED
Francois Dugast
francois.dugast at intel.com
Thu Jun 6 09:59:43 UTC 2024
On Thu, Jun 06, 2024 at 02:47:13AM +0000, Matthew Brost wrote:
> On Wed, Jun 05, 2024 at 06:04:34PM -0400, Rodrigo Vivi wrote:
> > On Wed, Jun 05, 2024 at 09:01:06PM +0000, Cavitt, Jonathan wrote:
> > > -----Original Message-----
> > > From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of Matthew Brost
> > > Sent: Tuesday, June 4, 2024 11:47 AM
> > > To: intel-xe at lists.freedesktop.org
> > > Cc: Brost, Matthew <matthew.brost at intel.com>
> > > Subject: [PATCH] drm/xe: Drop EXEC_QUEUE_FLAG_BANNED
> >
> > please prefer drm/xe/uapi: when the changes impact uapi.
> >
>
> Got it. Will be more careful going forward.
>
> > > >
> > > > Clean up laying violation of setting q->flags EXEC_QUEUE_FLAG_BANNED bit
> > > > in GuC backend. Move banned to GuC owned bit and report banned status to
> > > > upper layers via reset_status vfunc. This is a slight change in behavior
> > > > as reset_status returns true if wedged or killed bits set too, but in
> > > > all of these cases submission to queue is no longer allowed.
> >
> > This is an uapi change that is in use by mesa and we cannot regress.
>
> Agree.
>
> > We need to ensure that no user space is really using that before we can
> > apply anything like this.
> >
>
> I don't think affects the uAPI. The killed bit is only set after
> removing exec queue from the FD or the FD closing. In either case, the
> exec queue is not accessible to the user by the time this bit is set.
>
> The wedged bit is slight could be a change behavior but I'd argue this
> is actually fixing a bug. If we set the wedged bit, we skip setting the
> banned bit. IMO this fixing bug in the wedged series - when we wedge an
> exec queue it should not longer be available for the user to used.
>
> Now that I'm typing, I realize beyond that all IOCTLs return -ECANCELED
> once the device is wedged so this change likely isn't even visible to the
> user.
>
> In any of these cases - killed, wedged, or banned the exec queue is no
> longer available for use by a user too.
>
> > Cc: José Roberto de Souza <jose.souza at intel.com>
> > Cc: Francois Dugast <francois.dugast at intel.com>
> >
>
> Again will be more diligent about Cc stakeholders on uAPI changes. Let's
> get everyones input here.
Cc: Mateusz Jablonski <mateusz.jablonski at intel.com>
>
> Matt
>
> > > >
> > > > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/xe_exec.c | 2 +-
> > > > drivers/gpu/drm/xe/xe_exec_queue.c | 2 +-
> > > > drivers/gpu/drm/xe/xe_exec_queue_types.h | 12 +++++-------
> > > > drivers/gpu/drm/xe/xe_guc_submit.c | 10 ++++++----
> > > > 4 files changed, 13 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> > > > index 97eeb973e897..4cf6c6ab4866 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec.c
> > > > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > > > @@ -141,7 +141,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > > > q->width != args->num_batch_buffer))
> > > > return -EINVAL;
> > > >
> > > > - if (XE_IOCTL_DBG(xe, q->flags & EXEC_QUEUE_FLAG_BANNED)) {
> > > > + if (XE_IOCTL_DBG(xe, q->ops->reset_status(q))) {
> > > > err = -ECANCELED;
> > > > goto err_exec_queue;
> > > > }
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > index 27215075c799..cf45df0328da 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > @@ -677,7 +677,7 @@ int xe_exec_queue_get_property_ioctl(struct drm_device *dev, void *data,
> > > >
> > > > switch (args->property) {
> > > > case DRM_XE_EXEC_QUEUE_GET_PROPERTY_BAN:
> > > > - args->value = !!(q->flags & EXEC_QUEUE_FLAG_BANNED);
> > > > + args->value = q->ops->reset_status(q);
> > >
> > > LGTM.
> > >
> > > Maybe migrating over to using q->ops->reset_status could be done later, and
> > > instead we could just check the EXEC_QUEUE_STATE_BANNED flag directly for
> > > now, saving the change to reset_status for a separate patch. That way, we'd
> > > have more room to justify this change in the commit message separately from
> > > the one made to the EXEC_QUEUE_FLAG_BANNED. But it's not strictly
> > > necessary, IMO.
> > >
> > > Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > > -Jonathan Cavitt
> > >
> > > > ret = 0;
> > > > break;
> > > > default:
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > index 18d8b2a60928..f0c5f82ce7e3 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > @@ -70,18 +70,16 @@ struct xe_exec_queue {
> > > > */
> > > > struct dma_fence *last_fence;
> > > >
> > > > -/* queue no longer allowed to submit */
> > > > -#define EXEC_QUEUE_FLAG_BANNED BIT(0)
> > > > /* queue used for kernel submission only */
> > > > -#define EXEC_QUEUE_FLAG_KERNEL BIT(1)
> > > > +#define EXEC_QUEUE_FLAG_KERNEL BIT(0)
> > > > /* kernel engine only destroyed at driver unload */
> > > > -#define EXEC_QUEUE_FLAG_PERMANENT BIT(2)
> > > > +#define EXEC_QUEUE_FLAG_PERMANENT BIT(1)
> > > > /* for VM jobs. Caller needs to hold rpm ref when creating queue with this flag */
> > > > -#define EXEC_QUEUE_FLAG_VM BIT(3)
> > > > +#define EXEC_QUEUE_FLAG_VM BIT(2)
> > > > /* child of VM queue for multi-tile VM jobs */
> > > > -#define EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD BIT(4)
> > > > +#define EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD BIT(3)
> > > > /* kernel exec_queue only, set priority to highest level */
> > > > -#define EXEC_QUEUE_FLAG_HIGH_PRIORITY BIT(5)
> > > > +#define EXEC_QUEUE_FLAG_HIGH_PRIORITY BIT(4)
> > > >
> > > > /**
> > > > * @flags: flags for this exec queue, should statically setup aside from ban
> > > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > index 47aab04cf34f..4464ba337d12 100644
> > > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > @@ -61,6 +61,7 @@ exec_queue_to_guc(struct xe_exec_queue *q)
> > > > #define EXEC_QUEUE_STATE_RESET (1 << 6)
> > > > #define EXEC_QUEUE_STATE_KILLED (1 << 7)
> > > > #define EXEC_QUEUE_STATE_WEDGED (1 << 8)
> > > > +#define EXEC_QUEUE_STATE_BANNED (1 << 9)
> > > >
> > > > static bool exec_queue_registered(struct xe_exec_queue *q)
> > > > {
> > > > @@ -134,12 +135,12 @@ static void set_exec_queue_destroyed(struct xe_exec_queue *q)
> > > >
> > > > static bool exec_queue_banned(struct xe_exec_queue *q)
> > > > {
> > > > - return (q->flags & EXEC_QUEUE_FLAG_BANNED);
> > > > + return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_BANNED;
> > > > }
> > > >
> > > > static void set_exec_queue_banned(struct xe_exec_queue *q)
> > > > {
> > > > - q->flags |= EXEC_QUEUE_FLAG_BANNED;
> > > > + atomic_or(EXEC_QUEUE_STATE_BANNED, &q->guc->state);
> > > > }
> > > >
> > > > static bool exec_queue_suspended(struct xe_exec_queue *q)
> > > > @@ -189,8 +190,9 @@ static void set_exec_queue_wedged(struct xe_exec_queue *q)
> > > >
> > > > static bool exec_queue_killed_or_banned_or_wedged(struct xe_exec_queue *q)
> > > > {
> > > > - return exec_queue_banned(q) || (atomic_read(&q->guc->state) &
> > > > - (EXEC_QUEUE_STATE_WEDGED | EXEC_QUEUE_STATE_KILLED));
> > > > + return (atomic_read(&q->guc->state) &
> > > > + (EXEC_QUEUE_STATE_WEDGED | EXEC_QUEUE_STATE_KILLED |
> > > > + EXEC_QUEUE_STATE_BANNED));
> > > > }
> > > >
> > > > #ifdef CONFIG_PROVE_LOCKING
> > > > --
> > > > 2.34.1
> > > >
> > > >
More information about the Intel-xe
mailing list