[PATCH] drm/ttm: Drain workqueue before sys manager release

Christian König christian.koenig at amd.com
Mon Oct 16 11:03:22 UTC 2023


Am 13.10.23 um 16:34 schrieb Karolina Stolarek:
> In rare cases, a delayed destruction of a BO with a system resource
> could stay in the workqueue until drain_workqueue() is called
> in ttm_device_fini(). An attempt to free a resource from an already
> released manager results in NULL pointer dereference. Move the step
> of draining and destroying the workqueue so it happens before
> the ttm_sys_manager cleanup.

Good catch! But it's not only the drainage of the workqueue which is at 
the wrong place here.

Removing the device from the device list should come before destroying 
the system domain as well.

So I think you should probably rather move the chunk:

         man = ttm_manager_type(bdev, TTM_PL_SYSTEM);
         ttm_resource_manager_set_used(man, false);
         ttm_set_driver_manager(bdev, TTM_PL_SYSTEM, NULL);

after the destroy_workqueue() instead of the other way around.

Apart from that looks good to me.

Thanks,
Christian.

>
> Fixes: 9bff18d13473 ("drm/ttm: use per BO cleanup workers")
> Signed-off-by: Karolina Stolarek <karolina.stolarek at intel.com>
> ---
> Some background: I stumbled upon this issue when testing
> ttm_bo_pipeline_gutting() with BO with an active dma_resv fence. In ~2% of
> the runs, the delayed destruction of the ghost wouldn't happen until the
> drain_queue() step. man->func->free(man, *res) got called via
> ttm_bo_cleanup_memtype_use(), the manager and its functions were nowhere to
> be seen, resulting in a nulptr deref.
>
>   drivers/gpu/drm/ttm/ttm_device.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index 7726a72befc5..753126581620 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -232,6 +232,9 @@ void ttm_device_fini(struct ttm_device *bdev)
>   	struct ttm_resource_manager *man;
>   	unsigned i;
>   
> +	drain_workqueue(bdev->wq);
> +	destroy_workqueue(bdev->wq);
> +
>   	man = ttm_manager_type(bdev, TTM_PL_SYSTEM);
>   	ttm_resource_manager_set_used(man, false);
>   	ttm_set_driver_manager(bdev, TTM_PL_SYSTEM, NULL);
> @@ -240,9 +243,6 @@ void ttm_device_fini(struct ttm_device *bdev)
>   	list_del(&bdev->device_list);
>   	mutex_unlock(&ttm_global_mutex);
>   
> -	drain_workqueue(bdev->wq);
> -	destroy_workqueue(bdev->wq);
> -
>   	spin_lock(&bdev->lru_lock);
>   	for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i)
>   		if (list_empty(&man->lru[0]))



More information about the dri-devel mailing list