[PATCH v4 2/3] drm/xe/xe_guc_submit: Allow lr exec queues to be banned

Matthew Brost matthew.brost at intel.com
Fri Apr 12 23:45:32 UTC 2024


On Fri, Apr 12, 2024 at 02:58:21PM -0600, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Summers, Stuart <stuart.summers at intel.com> 
> Sent: Friday, April 12, 2024 1:49 PM
> To: intel-xe at lists.freedesktop.org; Cavitt, Jonathan <jonathan.cavitt at intel.com>
> Cc: Brost, Matthew <matthew.brost at intel.com>; Harrison, John C <john.c.harrison at intel.com>; Gupta, saurabhg <saurabhg.gupta at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>
> Subject: Re: [PATCH v4 2/3] drm/xe/xe_guc_submit: Allow lr exec queues to be banned
> > 
> > On Fri, 2024-04-05 at 10:55 -0700, Jonathan Cavitt wrote:
> > > LR queues currently don't get banned during a GT/GuC reset because
> > > they
> > > lack a job.  Though they don't have a job to detect the reset status
> > > of,
> > > it's still possible to tell when they should be banned by looking at
> > > the
> > > LRC: if the LRC head and tail don't match, then the exec queue should
> > > be
> > > banned and cleaned up.
> > > 
> > > This also requires swapping the usage of xe_sched_tdr_queue_imm with
> > > xe_guc_exec_queue_trigger_cleanup, as the former is specific to non-
> > > lr
> > > exec queues.
> > > 
> > > Suggested-by: Matthew Brost <matthew.brost at intel.com>
> > > Signed-off-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > > Reviewed-by: Matthew Brost <matthew.brost at intel.com>
> > > ---
> > > 
> > > v2:
> > > - Fix Subject line
> > > - Modify change slightly to remove need for "ban" boolean
> > > 
> > > v3: Revert change involving "ban" boolean to version 1
> > > 
> > > v4: Add missing semicolon and remove whitespace
> > > 
> > >  drivers/gpu/drm/xe/xe_guc_submit.c | 12 ++++++++++--
> > >  1 file changed, 10 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > index 1a6abb10a960e..e72f2a6cad60a 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > @@ -1424,15 +1424,23 @@ static void guc_exec_queue_stop(struct xe_guc
> > > *guc, struct xe_exec_queue *q)
> > >          */
> > >         if (!(q->flags & (EXEC_QUEUE_FLAG_KERNEL |
> > > EXEC_QUEUE_FLAG_VM))) {
> > >                 struct xe_sched_job *job =
> > > xe_sched_first_pending_job(sched);
> > > +               bool ban = false;
> > >  
> > >                 if (job) {
> > >                         if ((xe_sched_job_started(job) &&
> > >                             !xe_sched_job_completed(job)) ||
> > >                             xe_sched_invalidate_job(job, 2)) {
> > >                                 trace_xe_sched_job_ban(job);
> > > -                               set_exec_queue_banned(q);
> > > -                               xe_sched_tdr_queue_imm(&q->guc-
> > > >sched);
> > > +                               ban = true;
> > >                         }
> > > +               } else if (xe_exec_queue_is_lr(q) &&
> > > +                          (xe_lrc_ring_head(q->lrc) != q->lrc-
> > > >ring.tail)) {
> > 
> > Why do you read the head out of the lrc but the tail from our internal
> > variable? Isn't there a small chance here that you could get something
> > ready to send but not quite submitted to GuC and in which case these
> > would not be equal but not necessarily need the ban? I guess the flip
> > side is maybe that doesn't actually have any real functional impact...
> 

LR jobs don't have in deps, i.e. they run immediately. I guess there is
very small window between exec IOCTL moving the tail and actually
hitting the hardware. 

> 
> IIRC it's because no equivalent xe_lrc_ring_head function exists to
> read the lrc ring tail.  It's likely that I just missed the function to read
> the tail, so if you know what the function is, I'll replace the internal
> call with the proper functional call for the next revision.
> 
> Alternatively, should I use the internal variable for both halves of
> the comparison?

No, if anything check both values in memory.

That being said, I think this is fine as is due the nature of LRC
jobs but I guess it would not hurt to read tail from memory.

If Stuart insists, we change this.

Matt

> -Jonathan Cavitt
> 
> 
> > 
> > Thanks,
> > Stuart
> > 
> > > +                       ban = true;
> > > +               }
> > > +
> > > +               if (ban) {
> > > +                       set_exec_queue_banned(q);
> > > +                       xe_guc_exec_queue_trigger_cleanup(q);
> > >                 }
> > >         }
> > >  }
> > 
> > 


More information about the Intel-xe mailing list