[PATCH v4 2/3] drm/xe/xe_guc_submit: Allow lr exec queues to be banned
Matthew Brost
matthew.brost at intel.com
Fri Apr 12 23:45:32 UTC 2024
On Fri, Apr 12, 2024 at 02:58:21PM -0600, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Summers, Stuart <stuart.summers at intel.com>
> Sent: Friday, April 12, 2024 1:49 PM
> To: intel-xe at lists.freedesktop.org; Cavitt, Jonathan <jonathan.cavitt at intel.com>
> Cc: Brost, Matthew <matthew.brost at intel.com>; Harrison, John C <john.c.harrison at intel.com>; Gupta, saurabhg <saurabhg.gupta at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>
> Subject: Re: [PATCH v4 2/3] drm/xe/xe_guc_submit: Allow lr exec queues to be banned
> >
> > On Fri, 2024-04-05 at 10:55 -0700, Jonathan Cavitt wrote:
> > > LR queues currently don't get banned during a GT/GuC reset because
> > > they
> > > lack a job. Though they don't have a job to detect the reset status
> > > of,
> > > it's still possible to tell when they should be banned by looking at
> > > the
> > > LRC: if the LRC head and tail don't match, then the exec queue should
> > > be
> > > banned and cleaned up.
> > >
> > > This also requires swapping the usage of xe_sched_tdr_queue_imm with
> > > xe_guc_exec_queue_trigger_cleanup, as the former is specific to non-
> > > lr
> > > exec queues.
> > >
> > > Suggested-by: Matthew Brost <matthew.brost at intel.com>
> > > Signed-off-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > > Reviewed-by: Matthew Brost <matthew.brost at intel.com>
> > > ---
> > >
> > > v2:
> > > - Fix Subject line
> > > - Modify change slightly to remove need for "ban" boolean
> > >
> > > v3: Revert change involving "ban" boolean to version 1
> > >
> > > v4: Add missing semicolon and remove whitespace
> > >
> > > drivers/gpu/drm/xe/xe_guc_submit.c | 12 ++++++++++--
> > > 1 file changed, 10 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > index 1a6abb10a960e..e72f2a6cad60a 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > @@ -1424,15 +1424,23 @@ static void guc_exec_queue_stop(struct xe_guc
> > > *guc, struct xe_exec_queue *q)
> > > */
> > > if (!(q->flags & (EXEC_QUEUE_FLAG_KERNEL |
> > > EXEC_QUEUE_FLAG_VM))) {
> > > struct xe_sched_job *job =
> > > xe_sched_first_pending_job(sched);
> > > + bool ban = false;
> > >
> > > if (job) {
> > > if ((xe_sched_job_started(job) &&
> > > !xe_sched_job_completed(job)) ||
> > > xe_sched_invalidate_job(job, 2)) {
> > > trace_xe_sched_job_ban(job);
> > > - set_exec_queue_banned(q);
> > > - xe_sched_tdr_queue_imm(&q->guc-
> > > >sched);
> > > + ban = true;
> > > }
> > > + } else if (xe_exec_queue_is_lr(q) &&
> > > + (xe_lrc_ring_head(q->lrc) != q->lrc-
> > > >ring.tail)) {
> >
> > Why do you read the head out of the lrc but the tail from our internal
> > variable? Isn't there a small chance here that you could get something
> > ready to send but not quite submitted to GuC and in which case these
> > would not be equal but not necessarily need the ban? I guess the flip
> > side is maybe that doesn't actually have any real functional impact...
>
LR jobs don't have in deps, i.e. they run immediately. I guess there is
very small window between exec IOCTL moving the tail and actually
hitting the hardware.
>
> IIRC it's because no equivalent xe_lrc_ring_head function exists to
> read the lrc ring tail. It's likely that I just missed the function to read
> the tail, so if you know what the function is, I'll replace the internal
> call with the proper functional call for the next revision.
>
> Alternatively, should I use the internal variable for both halves of
> the comparison?
No, if anything check both values in memory.
That being said, I think this is fine as is due the nature of LRC
jobs but I guess it would not hurt to read tail from memory.
If Stuart insists, we change this.
Matt
> -Jonathan Cavitt
>
>
> >
> > Thanks,
> > Stuart
> >
> > > + ban = true;
> > > + }
> > > +
> > > + if (ban) {
> > > + set_exec_queue_banned(q);
> > > + xe_guc_exec_queue_trigger_cleanup(q);
> > > }
> > > }
> > > }
> >
> >
More information about the Intel-xe
mailing list