[PATCH] drm/xe: Release gud ids before cancelling work

Upadhyay, Tejas tejas.upadhyay at intel.com
Fri Mar 7 06:03:53 UTC 2025



> -----Original Message-----
> From: Brost, Matthew <matthew.brost at intel.com>
> Sent: Friday, March 7, 2025 11:32 AM
> To: Upadhyay, Tejas <tejas.upadhyay at intel.com>
> Cc: intel-xe at lists.freedesktop.org
> Subject: Re: [PATCH] drm/xe: Release gud ids before cancelling work
> 
> On Thu, Mar 06, 2025 at 06:42:11PM +0530, Tejas Upadhyay wrote:
> > A GT resets can be occurring in parallel while cancelling work in
> > async call  which can requeue these workers.
> > to avoid that, lets first release guc ids and then cancel work so they
> > don't requeued.
> >
> 
> Suggested-by: Matthew Brost <matthew.brost at intel.com>
> 
> > Fixes: 8ae8a2e8dd21 ("drm/xe: Long running job update")
> > Fixes: 18fbd567e75f ("drm/xe: cancel pending job timer before freeing
> > scheduler")
> > Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
> 
> Were you able to verify this fixes [1]?

It is not reproduced with or without fix, Ran 1000s iterations of whole testsuit on BMG as well as on LNL manually through scripting. May be CI can test it better.

Tejas
> 
> Regardless pretty sure this change is correct and required:
> Reviewed-by: Matthew Brost <matthew.brost at intel.com>
> 
> [1] https://intel-gfx-ci.01.org/tree/intel-xe/xe-2561-
> 3de9463a46e57221df6a0b5f2f5b7f33207d76f7/re-lnl-
> 4/igt at xe_exec_reset@cm-gt-reset.html
> 
> > ---
> >  drivers/gpu/drm/xe/xe_guc_submit.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> > b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index b95934055f72..31bc2022bfc2 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -1254,11 +1254,11 @@ static void
> __guc_exec_queue_fini_async(struct work_struct *w)
> >  	xe_pm_runtime_get(guc_to_xe(guc));
> >  	trace_xe_exec_queue_destroy(q);
> >
> > +	release_guc_id(guc, q);
> >  	if (xe_exec_queue_is_lr(q))
> >  		cancel_work_sync(&ge->lr_tdr);
> >  	/* Confirm no work left behind accessing device structures */
> >  	cancel_delayed_work_sync(&ge->sched.base.work_tdr);
> > -	release_guc_id(guc, q);
> >  	xe_sched_entity_fini(&ge->entity);
> >  	xe_sched_fini(&ge->sched);
> >
> > --
> > 2.34.1
> >


More information about the Intel-xe mailing list