[PATCH V2] drm/xe: skip error capture when exec queue is killed
Souza, Jose
jose.souza at intel.com
Thu May 2 13:56:40 UTC 2024
On Tue, 2024-04-30 at 18:42 +0530, Tejas Upadhyay wrote:
> When user closes exec queue soon after job submission,
> we are generating error coredump. Instead check if
> exec queue is killed during job timeout then skip
> error coredump capture.
Where this is happening?
Iris/OpenGL driver was not waiting for exec queue to idle and it was causing '*ERROR* GT0: TLB invalidation' errors in Xe KMD.
That was fixed here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27500/diffs?commit_id=665d30b5448f606d7a79afe0596c3a2264ab3e15
ANV/Vulkan driver should already do that.
The patch looks good but UMD or IGT needs to be fixed too.
>
> V2:
> - Just skip error capture - MattB
>
> Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
> ---
> drivers/gpu/drm/xe/xe_guc_submit.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index d274a139010b..2c0aa3443cd9 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -980,8 +980,10 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> xe_gt_WARN(q->gt, q->flags & EXEC_QUEUE_FLAG_VM && !exec_queue_killed(q),
> "VM job timed out on non-killed execqueue\n");
>
> - simple_error_capture(q);
> - xe_devcoredump(job);
> + if (!exec_queue_killed(q)) {
> + simple_error_capture(q);
> + xe_devcoredump(job);
> + }
>
> trace_xe_sched_job_timedout(job);
>
More information about the Intel-xe
mailing list