[PATCH] drm/xe: Fix fault on fd close when wedged
Matthew Brost
matthew.brost at intel.com
Thu Dec 12 03:51:57 UTC 2024
On Wed, Dec 11, 2024 at 02:53:32PM -0800, Lucas De Marchi wrote:
> If device is wedged, the final run ticks update for the client should be
> skipped as it's already unmapped. Fix this pagefault when forcing a
Where does exec queue get unmapped on wedging a device?
Matt
> wedged state with igt:
>
> <6> [IGT] xe_wedged: exiting, ret=98
> <1> BUG: unable to handle page fault for address: ffffc901bc5e508c
> <1> #PF: supervisor read access in kernel mode
> <1> #PF: error_code(0x0000) - not-present page
> ...
> <4> xe_lrc_update_timestamp+0x1c/0xd0 [xe]
> <4> xe_exec_queue_update_run_ticks+0x50/0xb0 [xe]
> <4> xe_exec_queue_fini+0x16/0xb0 [xe]
> <4> __guc_exec_queue_fini_async+0xc4/0x190 [xe]
> <4> guc_exec_queue_fini_async+0xa0/0xe0 [xe]
> <4> guc_exec_queue_fini+0x23/0x40 [xe]
> <4> xe_exec_queue_destroy+0xb3/0xf0 [xe]
> <4> xe_file_close+0xd4/0x1a0 [xe]
> <4> drm_file_free+0x210/0x280 [drm]
> <4> drm_close_helper.isra.0+0x6d/0x80 [drm]
> <4> drm_release_noglobal+0x20/0x90 [drm]
>
> Fixes: 83db047d9425 ("drm/xe: Stop accumulating LRC timestamp on job_free")
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3421
> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> ---
> drivers/gpu/drm/xe/xe_exec_queue.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index aab9e561153dc..9ad7a6b24cc3a 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -265,7 +265,9 @@ void xe_exec_queue_fini(struct xe_exec_queue *q)
> * Before releasing our ref to lrc and xef, accumulate our run ticks
> * and wakeup any waiters.
> */
> - xe_exec_queue_update_run_ticks(q);
> + if (!xe_device_wedged(gt_to_xe(q->gt)))
> + xe_exec_queue_update_run_ticks(q);
> +
> if (q->xef && atomic_dec_and_test(&q->xef->exec_queue.pending_removal))
> wake_up_var(&q->xef->exec_queue.pending_removal);
>
> --
> 2.47.0
>
More information about the Intel-xe
mailing list