[PATCH v4 2/3] drm/xe/xe_guc_submit: Allow lr exec queues to be banned

Summers, Stuart stuart.summers at intel.com
Fri Apr 12 20:48:45 UTC 2024


On Fri, 2024-04-05 at 10:55 -0700, Jonathan Cavitt wrote:
> LR queues currently don't get banned during a GT/GuC reset because
> they
> lack a job.  Though they don't have a job to detect the reset status
> of,
> it's still possible to tell when they should be banned by looking at
> the
> LRC: if the LRC head and tail don't match, then the exec queue should
> be
> banned and cleaned up.
> 
> This also requires swapping the usage of xe_sched_tdr_queue_imm with
> xe_guc_exec_queue_trigger_cleanup, as the former is specific to non-
> lr
> exec queues.
> 
> Suggested-by: Matthew Brost <matthew.brost at intel.com>
> Signed-off-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> Reviewed-by: Matthew Brost <matthew.brost at intel.com>
> ---
> 
> v2:
> - Fix Subject line
> - Modify change slightly to remove need for "ban" boolean
> 
> v3: Revert change involving "ban" boolean to version 1
> 
> v4: Add missing semicolon and remove whitespace
> 
>  drivers/gpu/drm/xe/xe_guc_submit.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 1a6abb10a960e..e72f2a6cad60a 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1424,15 +1424,23 @@ static void guc_exec_queue_stop(struct xe_guc
> *guc, struct xe_exec_queue *q)
>          */
>         if (!(q->flags & (EXEC_QUEUE_FLAG_KERNEL |
> EXEC_QUEUE_FLAG_VM))) {
>                 struct xe_sched_job *job =
> xe_sched_first_pending_job(sched);
> +               bool ban = false;
>  
>                 if (job) {
>                         if ((xe_sched_job_started(job) &&
>                             !xe_sched_job_completed(job)) ||
>                             xe_sched_invalidate_job(job, 2)) {
>                                 trace_xe_sched_job_ban(job);
> -                               set_exec_queue_banned(q);
> -                               xe_sched_tdr_queue_imm(&q->guc-
> >sched);
> +                               ban = true;
>                         }
> +               } else if (xe_exec_queue_is_lr(q) &&
> +                          (xe_lrc_ring_head(q->lrc) != q->lrc-
> >ring.tail)) {

Why do you read the head out of the lrc but the tail from our internal
variable? Isn't there a small chance here that you could get something
ready to send but not quite submitted to GuC and in which case these
would not be equal but not necessarily need the ban? I guess the flip
side is maybe that doesn't actually have any real functional impact...

Thanks,
Stuart

> +                       ban = true;
> +               }
> +
> +               if (ban) {
> +                       set_exec_queue_banned(q);
> +                       xe_guc_exec_queue_trigger_cleanup(q);
>                 }
>         }
>  }



More information about the Intel-xe mailing list