[PATCH 1/2] drm/xe_migrate: Switch from drm to dev managed actions
Upadhyay, Tejas
tejas.upadhyay at intel.com
Fri Feb 28 07:01:22 UTC 2025
> -----Original Message-----
> From: Bhatia, Aradhya <aradhya.bhatia at intel.com>
> Sent: Friday, February 28, 2025 12:22 PM
> To: Roper, Matthew D <matthew.d.roper at intel.com>
> Cc: Intel XE List <intel-xe at lists.freedesktop.org>; De Marchi, Lucas
> <lucas.demarchi at intel.com>; Hellstrom, Thomas
> <thomas.hellstrom at intel.com>; Upadhyay, Tejas
> <tejas.upadhyay at intel.com>; Ghimiray, Himal Prasad
> <himal.prasad.ghimiray at intel.com>; Bhatia, Aradhya
> <aradhya.bhatia at intel.com>
> Subject: [PATCH 1/2] drm/xe_migrate: Switch from drm to dev managed
> actions
>
> Change the scope of the migrate subsystem to be dev managed instead of
> drm managed.
>
> The parent pci struct &device, that the xe struct &drm_device is a part of, gets
> removed when a hot unplug is triggered, which causes the underlying iommu
> group to get destroyed as well.
>
> The migrate subsystem, which handles the lifetime of the page-table tree
> (pt) BO, doesn't get a chance to keep the BO back during the hot unplug, as all
> the references to DRM haven't been put back.
> When all the references to DRM are indeed put back later, the migrate
> subsystem tries to put back the pt BO. Since the underlying iommu group has
> been already destroyed, a kernel NULL ptr dereference takes place while
> attempting to keep back the pt BO.
Might be good to put short main crash dump as well here for issue clarification.
>
> Signed-off-by: Aradhya Bhatia <aradhya.bhatia at intel.com>
> ---
> drivers/gpu/drm/xe/xe_migrate.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> b/drivers/gpu/drm/xe/xe_migrate.c index 278bc96cf593..4e23adfa208a
> 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -97,7 +97,7 @@ struct xe_exec_queue
> *xe_tile_migrate_exec_queue(struct xe_tile *tile)
> return tile->migrate->q;
> }
>
> -static void xe_migrate_fini(struct drm_device *dev, void *arg)
> +static void xe_migrate_fini(void *arg)
> {
> struct xe_migrate *m = arg;
>
> @@ -401,7 +401,7 @@ struct xe_migrate *xe_migrate_init(struct xe_tile *tile)
> struct xe_vm *vm;
> int err;
>
> - m = drmm_kzalloc(&xe->drm, sizeof(*m), GFP_KERNEL);
> + m = devm_kzalloc(xe->drm.dev, sizeof(*m), GFP_KERNEL);
> if (!m)
> return ERR_PTR(-ENOMEM);
>
> @@ -455,7 +455,7 @@ struct xe_migrate *xe_migrate_init(struct xe_tile *tile)
> might_lock(&m->job_mutex);
> fs_reclaim_release(GFP_KERNEL);
>
> - err = drmm_add_action_or_reset(&xe->drm, xe_migrate_fini, m);
> + err = devm_add_action_or_reset(xe->drm.dev, xe_migrate_fini, m);
> if (err)
> return ERR_PTR(err);
Apart from above nit,
LGTM,
Reviewed-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
>
> --
> 2.45.2
More information about the Intel-xe
mailing list