[PATCH V3] drm/xe: cancel pending job timer before freeing scheduler

Matthew Brost matthew.brost at intel.com
Tue Feb 25 05:33:17 UTC 2025


On Tue, Feb 25, 2025 at 10:27:54AM +0530, Tejas Upadhyay wrote:
> Async call to __guc_exec_queue_fini_async frees scheduler
> at the same time when some scheduler submission would have
> timed out and restarted. To handle such small window race
> case, all pending jobs timer should be cancelled before
> freeing scheduler.
> 

'The async call to __guc_exec_queue_fini_async frees the scheduler while
a submission may time out and restart. To prevent this race condition,
the pending job timer should be canceled before freeing the scheduler.'

> V3(MattB):
>  - Adjust position of cancel pending job
>  - Remove gitlab issue# from commit message
> V2(MattB):
>  - Cancel pending jobs before scheduler finish
> 

Fixes tag?

> Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>

With above:
Reviewed-by: Matthew Brost <matthew.brost at intel.com>

> ---
>  drivers/gpu/drm/xe/xe_guc_submit.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 913c74d6e2ae..b6a2dd742ebd 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1248,6 +1248,8 @@ static void __guc_exec_queue_fini_async(struct work_struct *w)
>  
>  	if (xe_exec_queue_is_lr(q))
>  		cancel_work_sync(&ge->lr_tdr);
> +	/* Confirm no work left behind accessing device structures */
> +	cancel_delayed_work_sync(&ge->sched.base.work_tdr);
>  	release_guc_id(guc, q);
>  	xe_sched_entity_fini(&ge->entity);
>  	xe_sched_fini(&ge->sched);
> -- 
> 2.34.1
> 


More information about the Intel-xe mailing list