[PATCH] drm/xe: Drop EXEC_QUEUE_FLAG_BANNED

Matthew Brost matthew.brost at intel.com
Thu Jun 6 02:47:13 UTC 2024


On Wed, Jun 05, 2024 at 06:04:34PM -0400, Rodrigo Vivi wrote:
> On Wed, Jun 05, 2024 at 09:01:06PM +0000, Cavitt, Jonathan wrote:
> > -----Original Message-----
> > From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of Matthew Brost
> > Sent: Tuesday, June 4, 2024 11:47 AM
> > To: intel-xe at lists.freedesktop.org
> > Cc: Brost, Matthew <matthew.brost at intel.com>
> > Subject: [PATCH] drm/xe: Drop EXEC_QUEUE_FLAG_BANNED
> 
> please prefer drm/xe/uapi: when the changes impact uapi.
> 

Got it. Will be more careful going forward.

> > > 
> > > Clean up laying violation of setting q->flags EXEC_QUEUE_FLAG_BANNED bit
> > > in GuC backend. Move banned to GuC owned bit and report banned status to
> > > upper layers via reset_status vfunc. This is a slight change in behavior
> > > as reset_status returns true if wedged or killed bits set too, but in
> > > all of these cases submission to queue is no longer allowed.
> 
> This is an uapi change that is in use by mesa and we cannot regress.

Agree.

> We need to ensure that no user space is really using that before we can
> apply anything like this.
>

I don't think affects the uAPI. The killed bit is only set after
removing exec queue from the FD or the FD closing. In either case, the
exec queue is not accessible to the user by the time this bit is set.

The wedged bit is slight could be a change behavior but I'd argue this
is actually fixing a bug. If we set the wedged bit, we skip setting the
banned bit. IMO this fixing bug in the wedged series - when we wedge an
exec queue it should not longer be available for the user to used. 

Now that I'm typing, I realize beyond that all IOCTLs return -ECANCELED
once the device is wedged so this change likely isn't even visible to the
user.

In any of these cases - killed, wedged, or banned the exec queue is no
longer available for use by a user too.

> Cc: José Roberto de Souza <jose.souza at intel.com>
> Cc: Francois Dugast <francois.dugast at intel.com>
> 

Again will be more diligent about Cc stakeholders on uAPI changes. Let's
get everyones input here.
 
Matt

> > > 
> > > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_exec.c             |  2 +-
> > >  drivers/gpu/drm/xe/xe_exec_queue.c       |  2 +-
> > >  drivers/gpu/drm/xe/xe_exec_queue_types.h | 12 +++++-------
> > >  drivers/gpu/drm/xe/xe_guc_submit.c       | 10 ++++++----
> > >  4 files changed, 13 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> > > index 97eeb973e897..4cf6c6ab4866 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec.c
> > > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > > @@ -141,7 +141,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > >  			 q->width != args->num_batch_buffer))
> > >  		return -EINVAL;
> > >  
> > > -	if (XE_IOCTL_DBG(xe, q->flags & EXEC_QUEUE_FLAG_BANNED)) {
> > > +	if (XE_IOCTL_DBG(xe, q->ops->reset_status(q))) {
> > >  		err = -ECANCELED;
> > >  		goto err_exec_queue;
> > >  	}
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > index 27215075c799..cf45df0328da 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > @@ -677,7 +677,7 @@ int xe_exec_queue_get_property_ioctl(struct drm_device *dev, void *data,
> > >  
> > >  	switch (args->property) {
> > >  	case DRM_XE_EXEC_QUEUE_GET_PROPERTY_BAN:
> > > -		args->value = !!(q->flags & EXEC_QUEUE_FLAG_BANNED);
> > > +		args->value = q->ops->reset_status(q);
> > 
> > LGTM.
> > 
> > Maybe migrating over to using q->ops->reset_status could be done later, and
> > instead we could just check the EXEC_QUEUE_STATE_BANNED flag directly for
> > now, saving the change to reset_status for a separate patch.  That way, we'd
> > have more room to justify this change in the commit message separately from
> > the one made to the EXEC_QUEUE_FLAG_BANNED.  But it's not strictly
> > necessary, IMO.
> > 
> > Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > -Jonathan Cavitt
> > 
> > >  		ret = 0;
> > >  		break;
> > >  	default:
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > index 18d8b2a60928..f0c5f82ce7e3 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > @@ -70,18 +70,16 @@ struct xe_exec_queue {
> > >  	 */
> > >  	struct dma_fence *last_fence;
> > >  
> > > -/* queue no longer allowed to submit */
> > > -#define EXEC_QUEUE_FLAG_BANNED			BIT(0)
> > >  /* queue used for kernel submission only */
> > > -#define EXEC_QUEUE_FLAG_KERNEL			BIT(1)
> > > +#define EXEC_QUEUE_FLAG_KERNEL			BIT(0)
> > >  /* kernel engine only destroyed at driver unload */
> > > -#define EXEC_QUEUE_FLAG_PERMANENT		BIT(2)
> > > +#define EXEC_QUEUE_FLAG_PERMANENT		BIT(1)
> > >  /* for VM jobs. Caller needs to hold rpm ref when creating queue with this flag */
> > > -#define EXEC_QUEUE_FLAG_VM			BIT(3)
> > > +#define EXEC_QUEUE_FLAG_VM			BIT(2)
> > >  /* child of VM queue for multi-tile VM jobs */
> > > -#define EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD	BIT(4)
> > > +#define EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD	BIT(3)
> > >  /* kernel exec_queue only, set priority to highest level */
> > > -#define EXEC_QUEUE_FLAG_HIGH_PRIORITY		BIT(5)
> > > +#define EXEC_QUEUE_FLAG_HIGH_PRIORITY		BIT(4)
> > >  
> > >  	/**
> > >  	 * @flags: flags for this exec queue, should statically setup aside from ban
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > index 47aab04cf34f..4464ba337d12 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > @@ -61,6 +61,7 @@ exec_queue_to_guc(struct xe_exec_queue *q)
> > >  #define EXEC_QUEUE_STATE_RESET			(1 << 6)
> > >  #define EXEC_QUEUE_STATE_KILLED			(1 << 7)
> > >  #define EXEC_QUEUE_STATE_WEDGED			(1 << 8)
> > > +#define EXEC_QUEUE_STATE_BANNED			(1 << 9)
> > >  
> > >  static bool exec_queue_registered(struct xe_exec_queue *q)
> > >  {
> > > @@ -134,12 +135,12 @@ static void set_exec_queue_destroyed(struct xe_exec_queue *q)
> > >  
> > >  static bool exec_queue_banned(struct xe_exec_queue *q)
> > >  {
> > > -	return (q->flags & EXEC_QUEUE_FLAG_BANNED);
> > > +	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_BANNED;
> > >  }
> > >  
> > >  static void set_exec_queue_banned(struct xe_exec_queue *q)
> > >  {
> > > -	q->flags |= EXEC_QUEUE_FLAG_BANNED;
> > > +	atomic_or(EXEC_QUEUE_STATE_BANNED, &q->guc->state);
> > >  }
> > >  
> > >  static bool exec_queue_suspended(struct xe_exec_queue *q)
> > > @@ -189,8 +190,9 @@ static void set_exec_queue_wedged(struct xe_exec_queue *q)
> > >  
> > >  static bool exec_queue_killed_or_banned_or_wedged(struct xe_exec_queue *q)
> > >  {
> > > -	return exec_queue_banned(q) || (atomic_read(&q->guc->state) &
> > > -		(EXEC_QUEUE_STATE_WEDGED | EXEC_QUEUE_STATE_KILLED));
> > > +	return (atomic_read(&q->guc->state) &
> > > +		(EXEC_QUEUE_STATE_WEDGED | EXEC_QUEUE_STATE_KILLED |
> > > +		 EXEC_QUEUE_STATE_BANNED));
> > >  }
> > >  
> > >  #ifdef CONFIG_PROVE_LOCKING
> > > -- 
> > > 2.34.1
> > > 
> > > 


More information about the Intel-xe mailing list