[PATCH] drm/xe: Fix fault on fd close when wedged

Matthew Brost matthew.brost at intel.com
Thu Dec 12 03:51:57 UTC 2024


On Wed, Dec 11, 2024 at 02:53:32PM -0800, Lucas De Marchi wrote:
> If device is wedged, the final run ticks update for the client should be
> skipped as it's already unmapped. Fix this pagefault when forcing a

Where does exec queue get unmapped on wedging a device?

Matt

> wedged state with igt:
> 
> <6> [IGT] xe_wedged: exiting, ret=98
> <1> BUG: unable to handle page fault for address: ffffc901bc5e508c
> <1> #PF: supervisor read access in kernel mode
> <1> #PF: error_code(0x0000) - not-present page
> ...
> <4>   xe_lrc_update_timestamp+0x1c/0xd0 [xe]
> <4>   xe_exec_queue_update_run_ticks+0x50/0xb0 [xe]
> <4>   xe_exec_queue_fini+0x16/0xb0 [xe]
> <4>   __guc_exec_queue_fini_async+0xc4/0x190 [xe]
> <4>   guc_exec_queue_fini_async+0xa0/0xe0 [xe]
> <4>   guc_exec_queue_fini+0x23/0x40 [xe]
> <4>   xe_exec_queue_destroy+0xb3/0xf0 [xe]
> <4>   xe_file_close+0xd4/0x1a0 [xe]
> <4>   drm_file_free+0x210/0x280 [drm]
> <4>   drm_close_helper.isra.0+0x6d/0x80 [drm]
> <4>   drm_release_noglobal+0x20/0x90 [drm]
> 
> Fixes: 83db047d9425 ("drm/xe: Stop accumulating LRC timestamp on job_free")
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3421
> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec_queue.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index aab9e561153dc..9ad7a6b24cc3a 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -265,7 +265,9 @@ void xe_exec_queue_fini(struct xe_exec_queue *q)
>  	 * Before releasing our ref to lrc and xef, accumulate our run ticks
>  	 * and wakeup any waiters.
>  	 */
> -	xe_exec_queue_update_run_ticks(q);
> +	if (!xe_device_wedged(gt_to_xe(q->gt)))
> +		xe_exec_queue_update_run_ticks(q);
> +
>  	if (q->xef && atomic_dec_and_test(&q->xef->exec_queue.pending_removal))
>  		wake_up_var(&q->xef->exec_queue.pending_removal);
>  
> -- 
> 2.47.0
> 


More information about the Intel-xe mailing list