[PATCH] drm/xe: cancel pending job timer before freeing scheduler

Matthew Brost matthew.brost at intel.com
Mon Feb 24 14:51:23 UTC 2025


On Mon, Feb 24, 2025 at 05:52:37PM +0530, Tejas Upadhyay wrote:
> Async call to __guc_exec_queue_fini_async frees scheduler
> at the same time when some scheduler submission would have
> timed out and restarted. To handle such small window race
> case, all pending jobs timer should be cancelled before
> freeing scheduler.
> 
> It will help to solve below which is not easily reproducible,
> https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4223

I'd drop this part of the comment as it creates confusion as this patch
is not fixing the above issue. 

> 
> V2(MattB):
>  - Cancel pending jobs before scheduler finish
> 
> Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_guc_submit.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index f2ce3086838c..8b7165d3820b 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1258,6 +1258,8 @@ static void __guc_exec_queue_fini_async(struct work_struct *w)
>  		cancel_work_sync(&ge->lr_tdr);
>  	release_guc_id(guc, q);
>  	xe_sched_entity_fini(&ge->entity);
> +	/* Confirm no work left behind accessing device structures */
> +	cancel_delayed_work_sync(&ge->sched.base.work_tdr);

Nit: I'd move this next 'cancel_work_sync(&ge->lr_tdr)' statement.

Matt

>  	xe_sched_fini(&ge->sched);
>  
>  	kfree(ge);
> -- 
> 2.34.1
> 


More information about the Intel-xe mailing list