[PATCH 1/2] drm/xe_migrate: Switch from drm to dev managed actions

Upadhyay, Tejas tejas.upadhyay at intel.com
Fri Feb 28 07:01:22 UTC 2025



> -----Original Message-----
> From: Bhatia, Aradhya <aradhya.bhatia at intel.com>
> Sent: Friday, February 28, 2025 12:22 PM
> To: Roper, Matthew D <matthew.d.roper at intel.com>
> Cc: Intel XE List <intel-xe at lists.freedesktop.org>; De Marchi, Lucas
> <lucas.demarchi at intel.com>; Hellstrom, Thomas
> <thomas.hellstrom at intel.com>; Upadhyay, Tejas
> <tejas.upadhyay at intel.com>; Ghimiray, Himal Prasad
> <himal.prasad.ghimiray at intel.com>; Bhatia, Aradhya
> <aradhya.bhatia at intel.com>
> Subject: [PATCH 1/2] drm/xe_migrate: Switch from drm to dev managed
> actions
> 
> Change the scope of the migrate subsystem to be dev managed instead of
> drm managed.
> 
> The parent pci struct &device, that the xe struct &drm_device is a part of, gets
> removed when a hot unplug is triggered, which causes the underlying iommu
> group to get destroyed as well.
> 
> The migrate subsystem, which handles the lifetime of the page-table tree
> (pt) BO, doesn't get a chance to keep the BO back during the hot unplug, as all
> the references to DRM haven't been put back.
> When all the references to DRM are indeed put back later, the migrate
> subsystem tries to put back the pt BO. Since the underlying iommu group has
> been already destroyed, a kernel NULL ptr dereference takes place while
> attempting to keep back the pt BO.

Might be good to put short main crash dump as well here for issue clarification.

> 
> Signed-off-by: Aradhya Bhatia <aradhya.bhatia at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_migrate.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> b/drivers/gpu/drm/xe/xe_migrate.c index 278bc96cf593..4e23adfa208a
> 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -97,7 +97,7 @@ struct xe_exec_queue
> *xe_tile_migrate_exec_queue(struct xe_tile *tile)
>  	return tile->migrate->q;
>  }
> 
> -static void xe_migrate_fini(struct drm_device *dev, void *arg)
> +static void xe_migrate_fini(void *arg)
>  {
>  	struct xe_migrate *m = arg;
> 
> @@ -401,7 +401,7 @@ struct xe_migrate *xe_migrate_init(struct xe_tile *tile)
>  	struct xe_vm *vm;
>  	int err;
> 
> -	m = drmm_kzalloc(&xe->drm, sizeof(*m), GFP_KERNEL);
> +	m = devm_kzalloc(xe->drm.dev, sizeof(*m), GFP_KERNEL);
>  	if (!m)
>  		return ERR_PTR(-ENOMEM);
> 
> @@ -455,7 +455,7 @@ struct xe_migrate *xe_migrate_init(struct xe_tile *tile)
>  	might_lock(&m->job_mutex);
>  	fs_reclaim_release(GFP_KERNEL);
> 
> -	err = drmm_add_action_or_reset(&xe->drm, xe_migrate_fini, m);
> +	err = devm_add_action_or_reset(xe->drm.dev, xe_migrate_fini, m);
>  	if (err)
>  		return ERR_PTR(err);

Apart from above nit,
LGTM,
Reviewed-by: Tejas Upadhyay <tejas.upadhyay at intel.com>

> 
> --
> 2.45.2



More information about the Intel-xe mailing list