[PATCH v5 7/8] drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset

Matthew Brost matthew.brost at intel.com
Tue Jul 8 18:35:37 UTC 2025


On Tue, Jul 08, 2025 at 10:25:47AM -0300, Maíra Canal wrote:
> Xe can skip the reset if TDR has fired before the free job worker and can
> also re-arm the timeout timer in some scenarios. Instead of manipulating
> scheduler's internals, inform the scheduler that the job did not actually
> timeout and no reset was performed through the new status code
> DRM_GPU_SCHED_STAT_NO_HANG.
> 
> Note that, in the first case, there is no need to restart submission if it
> hasn't been stopped.
> 
> Signed-off-by: Maíra Canal <mcanal at igalia.com>

Reviewed-by: Matthew Brost <matthew.brost at intel.com>

> ---
>  drivers/gpu/drm/xe/xe_guc_submit.c | 12 +++---------
>  1 file changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 1430b58d096b03a78292e523e3ee7c5dddd7efdd..cafb47711e9b3fab3b4b4197965835197caabe9b 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1093,12 +1093,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>  	 * list so job can be freed and kick scheduler ensuring free job is not
>  	 * lost.
>  	 */
> -	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence->flags)) {
> -		xe_sched_add_pending_job(sched, job);
> -		xe_sched_submission_start(sched);
> -
> -		return DRM_GPU_SCHED_STAT_RESET;
> -	}
> +	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence->flags))
> +		return DRM_GPU_SCHED_STAT_NO_HANG;
>  
>  	/* Kill the run_job entry point */
>  	xe_sched_submission_stop(sched);
> @@ -1277,10 +1273,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>  	 * but there is not currently an easy way to do in DRM scheduler. With
>  	 * some thought, do this in a follow up.
>  	 */
> -	xe_sched_add_pending_job(sched, job);
>  	xe_sched_submission_start(sched);
> -
> -	return DRM_GPU_SCHED_STAT_RESET;
> +	return DRM_GPU_SCHED_STAT_NO_HANG;
>  }
>  
>  static void __guc_exec_queue_fini_async(struct work_struct *w)
> 
> -- 
> 2.50.0
> 


More information about the Intel-xe mailing list