[PATCH] drm/xe/devcoredump: Defer devcoredump initialization during probe
Rodrigo Vivi
rodrigo.vivi at intel.com
Fri Jul 25 14:58:18 UTC 2025
On Fri, Jul 25, 2025 at 12:25:43PM +0530, Balasubramani Vivekanandan wrote:
> Doing devcoredump initializing before GT though look harmless, it leads
> to problem during driver unbind. Because of this order, GT/Engine
> release functions will be called before xe devcoredump release function
> (xe_driver_devcoredump_fini) leading to the following kernel crash
> because the devcoredump functions might still use GT/Engine
> datastructures after those are freed.
I agree on moving this initialization a bit further, but I'd like
to have a commit message that explains the problem a bit better.
looking to xe_driver_devcoredump_fini it is just a dev_coredump_put()
so, it is not reading anything on the gt... not triggering the
__xe_devcoredump_read().
So, please explain the flow that is causing the below's error better.
>
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> Workqueue: events_unbound xe_devcoredump_deferred_snap_work [xe]
> RIP: 0010:xe_engine_snapshot_print+0x47/0x420 [xe]
> Call Trace:
> <TASK>
> ? drm_printf+0x64/0x90
> __xe_devcoredump_read+0x23f/0x2d0 [xe]
> ? __pfx___drm_printfn_coredump+0x10/0x10
> ? __pfx___drm_puts_coredump+0x10/0x10
> xe_devcoredump_deferred_snap_work+0x17a/0x190 [xe]
> process_one_work+0x22e/0x6f0
> worker_thread+0x1e8/0x3d0
> ? __pfx_worker_thread+0x10/0x10
> kthread+0x11f/0x250
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x47/0x70
> ? __pfx_kthread+0x10/0x10
> ret_from_fork_asm+0x1a/0x30
>
> Fixes: 4209d635a823 ("drm/xe: Remove devcoredump during driver release")
> Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan at intel.com>
> ---
> drivers/gpu/drm/xe/xe_device.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index d04a0ae018e6..ae48cd3c7bf0 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -821,10 +821,6 @@ int xe_device_probe(struct xe_device *xe)
> return err;
> }
>
> - err = xe_devcoredump_init(xe);
> - if (err)
> - return err;
> -
> /*
> * From here on, if a step fails, make sure a Driver-FLR is triggereed
> */
> @@ -889,6 +885,10 @@ int xe_device_probe(struct xe_device *xe)
> XE_WA(xe->tiles->media_gt, 15015404425_disable))
> XE_DEVICE_WA_DISABLE(xe, 15015404425);
>
> + err = xe_devcoredump_init(xe);
> + if (err)
> + return err;
> +
> xe_nvm_init(xe);
>
> err = xe_heci_gsc_init(xe);
> --
> 2.34.1
>
More information about the Intel-xe
mailing list