[PATCH] drm/xe: Unlink client during vm close
Matthew Brost
matthew.brost at intel.com
Thu Jul 18 15:58:04 UTC 2024
On Thu, Jul 18, 2024 at 06:47:52PM +0530, Tejas Upadhyay wrote:
> We have async call which does not know if client
> unlinked from vm by the time it is accessed. Set
> client unlink early during xe_vm_close() so that
> async API do not touch closed client info.
>
> Also, debugs related to job timeout is not useful
> when its "no process" or client already unlinked.
>
It kernel exec queue timeout jobs, now the 'Timedout job' message will
not be displayed which is not ideal.
> Fixes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2273
Where is exactly is this access coming from?
BUG: kernel NULL pointer dereference, address: 0000000000000058
Also btw, the correct tag for gitlab link is 'Closes', "Fixes' is the
offending kernel patch so the fixe can be pulled into stable kernels.
> Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
> ---
> drivers/gpu/drm/xe/xe_guc_submit.c | 7 ++++---
> drivers/gpu/drm/xe/xe_vm.c | 1 +
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 860405527115..1de141cb84c6 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1166,10 +1166,11 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> process_name = task->comm;
> pid = task->pid;
> }
> + xe_gt_notice(guc_to_gt(guc), "Timedout job: seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx in %s [%d]",
> + xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
> + q->guc->id, q->flags, process_name, pid);
> }
> - xe_gt_notice(guc_to_gt(guc), "Timedout job: seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx in %s [%d]",
> - xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
> - q->guc->id, q->flags, process_name, pid);
> +
> if (task)
> put_task_struct(task);
>
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index cf3aea5d8cdc..660b20e0e207 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1537,6 +1537,7 @@ static void xe_vm_close(struct xe_vm *vm)
> {
> down_write(&vm->lock);
> vm->size = 0;
> + vm->xef = NULL;
This doesn't appear to be thread safe.
Matt
> up_write(&vm->lock);
> }
>
> --
> 2.25.1
>
More information about the Intel-xe
mailing list