[PATCH v4 2/3] drm/xe/xe_guc_submit: Allow lr exec queues to be banned

Cavitt, Jonathan jonathan.cavitt at intel.com
Fri Apr 12 20:58:21 UTC 2024


-----Original Message-----
From: Summers, Stuart <stuart.summers at intel.com> 
Sent: Friday, April 12, 2024 1:49 PM
To: intel-xe at lists.freedesktop.org; Cavitt, Jonathan <jonathan.cavitt at intel.com>
Cc: Brost, Matthew <matthew.brost at intel.com>; Harrison, John C <john.c.harrison at intel.com>; Gupta, saurabhg <saurabhg.gupta at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>
Subject: Re: [PATCH v4 2/3] drm/xe/xe_guc_submit: Allow lr exec queues to be banned
> 
> On Fri, 2024-04-05 at 10:55 -0700, Jonathan Cavitt wrote:
> > LR queues currently don't get banned during a GT/GuC reset because
> > they
> > lack a job.  Though they don't have a job to detect the reset status
> > of,
> > it's still possible to tell when they should be banned by looking at
> > the
> > LRC: if the LRC head and tail don't match, then the exec queue should
> > be
> > banned and cleaned up.
> > 
> > This also requires swapping the usage of xe_sched_tdr_queue_imm with
> > xe_guc_exec_queue_trigger_cleanup, as the former is specific to non-
> > lr
> > exec queues.
> > 
> > Suggested-by: Matthew Brost <matthew.brost at intel.com>
> > Signed-off-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > Reviewed-by: Matthew Brost <matthew.brost at intel.com>
> > ---
> > 
> > v2:
> > - Fix Subject line
> > - Modify change slightly to remove need for "ban" boolean
> > 
> > v3: Revert change involving "ban" boolean to version 1
> > 
> > v4: Add missing semicolon and remove whitespace
> > 
> >  drivers/gpu/drm/xe/xe_guc_submit.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> > b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index 1a6abb10a960e..e72f2a6cad60a 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -1424,15 +1424,23 @@ static void guc_exec_queue_stop(struct xe_guc
> > *guc, struct xe_exec_queue *q)
> >          */
> >         if (!(q->flags & (EXEC_QUEUE_FLAG_KERNEL |
> > EXEC_QUEUE_FLAG_VM))) {
> >                 struct xe_sched_job *job =
> > xe_sched_first_pending_job(sched);
> > +               bool ban = false;
> >  
> >                 if (job) {
> >                         if ((xe_sched_job_started(job) &&
> >                             !xe_sched_job_completed(job)) ||
> >                             xe_sched_invalidate_job(job, 2)) {
> >                                 trace_xe_sched_job_ban(job);
> > -                               set_exec_queue_banned(q);
> > -                               xe_sched_tdr_queue_imm(&q->guc-
> > >sched);
> > +                               ban = true;
> >                         }
> > +               } else if (xe_exec_queue_is_lr(q) &&
> > +                          (xe_lrc_ring_head(q->lrc) != q->lrc-
> > >ring.tail)) {
> 
> Why do you read the head out of the lrc but the tail from our internal
> variable? Isn't there a small chance here that you could get something
> ready to send but not quite submitted to GuC and in which case these
> would not be equal but not necessarily need the ban? I guess the flip
> side is maybe that doesn't actually have any real functional impact...


IIRC it's because no equivalent xe_lrc_ring_head function exists to
read the lrc ring tail.  It's likely that I just missed the function to read
the tail, so if you know what the function is, I'll replace the internal
call with the proper functional call for the next revision.

Alternatively, should I use the internal variable for both halves of
the comparison?
-Jonathan Cavitt


> 
> Thanks,
> Stuart
> 
> > +                       ban = true;
> > +               }
> > +
> > +               if (ban) {
> > +                       set_exec_queue_banned(q);
> > +                       xe_guc_exec_queue_trigger_cleanup(q);
> >                 }
> >         }
> >  }
> 
> 


More information about the Intel-xe mailing list