[PATCH V2] drm/xe: skip error capture when exec queue is killed

Souza, Jose jose.souza at intel.com
Thu May 2 13:56:40 UTC 2024


On Tue, 2024-04-30 at 18:42 +0530, Tejas Upadhyay wrote:
> When user closes exec queue soon after job submission,
> we are generating error coredump. Instead check if
> exec queue is killed during job timeout then skip
> error coredump capture.

Where this is happening?

Iris/OpenGL driver was not waiting for exec queue to idle and it was causing '*ERROR* GT0: TLB invalidation' errors in Xe KMD.
That was fixed here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27500/diffs?commit_id=665d30b5448f606d7a79afe0596c3a2264ab3e15
ANV/Vulkan driver should already do that.

The patch looks good but UMD or IGT needs to be fixed too.


> 
> V2:
>   - Just skip error capture - MattB
> 
> Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_guc_submit.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index d274a139010b..2c0aa3443cd9 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -980,8 +980,10 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>  	xe_gt_WARN(q->gt, q->flags & EXEC_QUEUE_FLAG_VM && !exec_queue_killed(q),
>  		   "VM job timed out on non-killed execqueue\n");
>  
> -	simple_error_capture(q);
> -	xe_devcoredump(job);
> +	if (!exec_queue_killed(q)) {
> +		simple_error_capture(q);
> +		xe_devcoredump(job);
> +	}
>  
>  	trace_xe_sched_job_timedout(job);
>  



More information about the Intel-xe mailing list