[PATCH] drm/xe: Release gud ids before cancelling work
Matthew Brost
matthew.brost at intel.com
Fri Mar 7 06:02:29 UTC 2025
On Thu, Mar 06, 2025 at 06:42:11PM +0530, Tejas Upadhyay wrote:
> A GT resets can be occurring in parallel while cancelling
> work in async call which can requeue these workers.
> to avoid that, lets first release guc ids and then cancel
> work so they don't requeued.
>
Suggested-by: Matthew Brost <matthew.brost at intel.com>
> Fixes: 8ae8a2e8dd21 ("drm/xe: Long running job update")
> Fixes: 18fbd567e75f ("drm/xe: cancel pending job timer before freeing scheduler")
> Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
Were you able to verify this fixes [1]?
Regardless pretty sure this change is correct and required:
Reviewed-by: Matthew Brost <matthew.brost at intel.com>
[1] https://intel-gfx-ci.01.org/tree/intel-xe/xe-2561-3de9463a46e57221df6a0b5f2f5b7f33207d76f7/re-lnl-4/igt@xe_exec_reset@cm-gt-reset.html
> ---
> drivers/gpu/drm/xe/xe_guc_submit.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index b95934055f72..31bc2022bfc2 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1254,11 +1254,11 @@ static void __guc_exec_queue_fini_async(struct work_struct *w)
> xe_pm_runtime_get(guc_to_xe(guc));
> trace_xe_exec_queue_destroy(q);
>
> + release_guc_id(guc, q);
> if (xe_exec_queue_is_lr(q))
> cancel_work_sync(&ge->lr_tdr);
> /* Confirm no work left behind accessing device structures */
> cancel_delayed_work_sync(&ge->sched.base.work_tdr);
> - release_guc_id(guc, q);
> xe_sched_entity_fini(&ge->entity);
> xe_sched_fini(&ge->sched);
>
> --
> 2.34.1
>
More information about the Intel-xe
mailing list