[PATCH] drm/xe: Unlink client during vm close

Matthew Brost matthew.brost at intel.com
Thu Jul 18 15:58:04 UTC 2024


On Thu, Jul 18, 2024 at 06:47:52PM +0530, Tejas Upadhyay wrote:
> We have async call which does not know if client
> unlinked from vm by the time it is accessed. Set
> client unlink early during xe_vm_close() so that
> async API do not touch closed client info.
> 
> Also, debugs related to job timeout is not useful
> when its "no process" or client already unlinked.
> 

It kernel exec queue timeout jobs, now the 'Timedout job' message will
not be displayed which is not ideal. 

> Fixes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2273

Where is exactly is this access coming from?
BUG: kernel NULL pointer dereference, address: 0000000000000058

Also btw, the correct tag for gitlab link is 'Closes', "Fixes' is the
offending kernel patch so the fixe can be pulled into stable kernels.

> Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_guc_submit.c | 7 ++++---
>  drivers/gpu/drm/xe/xe_vm.c         | 1 +
>  2 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 860405527115..1de141cb84c6 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1166,10 +1166,11 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>  			process_name = task->comm;
>  			pid = task->pid;
>  		}
> +		xe_gt_notice(guc_to_gt(guc), "Timedout job: seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx in %s [%d]",
> +			     xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
> +			     q->guc->id, q->flags, process_name, pid);
>  	}
> -	xe_gt_notice(guc_to_gt(guc), "Timedout job: seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx in %s [%d]",
> -		     xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
> -		     q->guc->id, q->flags, process_name, pid);
> +
>  	if (task)
>  		put_task_struct(task);
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index cf3aea5d8cdc..660b20e0e207 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1537,6 +1537,7 @@ static void xe_vm_close(struct xe_vm *vm)
>  {
>  	down_write(&vm->lock);
>  	vm->size = 0;
> +	vm->xef = NULL;

This doesn't appear to be thread safe.

Matt

>  	up_write(&vm->lock);
>  }
>  
> -- 
> 2.25.1
> 


More information about the Intel-xe mailing list