[PATCH v3 7/8] drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
Philipp Stanner
phasta at mailbox.org
Mon Jun 23 14:28:29 UTC 2025
On Wed, 2025-06-18 at 11:47 -0300, Maíra Canal wrote:
> Xe can skip the reset if TDR has fired before the free job worker and
> can
> also re-arm the timeout timer in some scenarios. Instead of
> manipulating
> scheduler's internals, inform the scheduler that the job did not
> actually
> timeout and no reset was performed through the new status code
> DRM_GPU_SCHED_STAT_NO_HANG.
>
> Note that, in the first case, there is no need to restart submission
> if it
> hasn't been stopped.
>
> Signed-off-by: Maíra Canal <mcanal at igalia.com>
Did you have the opportunity to test that one?
If not, at least a RB from one of the Intel folks is likely a desirable
thing, since the changes are non-trivial.
P.
> ---
> drivers/gpu/drm/xe/xe_guc_submit.c | 12 +++---------
> 1 file changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> b/drivers/gpu/drm/xe/xe_guc_submit.c
> index
> 9c7e445b9ea7ce7e3610eadca023e6d810e683e9..f6289eeffd852e40b33d0e455d9
> bcc21a4fb1467 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1078,12 +1078,8 @@ guc_exec_queue_timedout_job(struct
> drm_sched_job *drm_job)
> * list so job can be freed and kick scheduler ensuring free
> job is not
> * lost.
> */
> - if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence-
> >flags)) {
> - xe_sched_add_pending_job(sched, job);
> - xe_sched_submission_start(sched);
> -
> - return DRM_GPU_SCHED_STAT_RESET;
> - }
> + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence-
> >flags))
> + return DRM_GPU_SCHED_STAT_NO_HANG;
>
> /* Kill the run_job entry point */
> xe_sched_submission_stop(sched);
> @@ -1261,10 +1257,8 @@ guc_exec_queue_timedout_job(struct
> drm_sched_job *drm_job)
> * but there is not currently an easy way to do in DRM
> scheduler. With
> * some thought, do this in a follow up.
> */
> - xe_sched_add_pending_job(sched, job);
> xe_sched_submission_start(sched);
> -
> - return DRM_GPU_SCHED_STAT_RESET;
> + return DRM_GPU_SCHED_STAT_NO_HANG;
> }
>
> static void __guc_exec_queue_fini_async(struct work_struct *w)
>
More information about the Intel-xe
mailing list