[PATCH] drm/ttm: Don't delete the system manager before the delayed delete

Christian König christian.koenig at amd.com
Mon Sep 20 06:30:46 UTC 2021


Am 17.09.21 um 19:53 schrieb Zack Rusin:
> On some hardware, in particular in virtualized environments, the
> system memory can be shared with the "hardware". In those cases
> the BO's allocated through the ttm system manager might be
> busy during ttm_bo_put which results in them being scheduled
> for a delayed deletion.

While the patch itself is probably fine the reasoning here is a clear NAK.

Buffers in the system domain are not GPU accessible by definition, even 
in a shared environment and so *must* be idle.

Otherwise you break quite a number of assumptions in the code.

Regards,
Christian.

>
> The problem is that that the ttm system manager is disabled
> before the final delayed deletion is ran in ttm_device_fini.
> This results in crashes during freeing of the BO resources
> because they're trying to remove themselves from a no longer
> existent ttm_resource_manager (e.g. in IGT's core_hotunplug
> on vmwgfx)
>
> In general reloading any driver that could share system mem
> resources with "hardware" could hit it because nothing
> prevents the system mem resources from being scheduled
> for delayed deletion (apart from them not being busy probably
> anywhere apart from virtualized environments).
>
> Signed-off-by: Zack Rusin <zackr at vmware.com>
> Cc: Christian Koenig <christian.koenig at amd.com>
> Cc: Huang Rui <ray.huang at amd.com>
> Cc: David Airlie <airlied at linux.ie>
> Cc: Daniel Vetter <daniel at ffwll.ch>
> Cc: dri-devel at lists.freedesktop.org
> ---
>   drivers/gpu/drm/ttm/ttm_device.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index 9eb8f54b66fc..4ef19cafc755 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -225,10 +225,6 @@ void ttm_device_fini(struct ttm_device *bdev)
>   	struct ttm_resource_manager *man;
>   	unsigned i;
>   
> -	man = ttm_manager_type(bdev, TTM_PL_SYSTEM);
> -	ttm_resource_manager_set_used(man, false);
> -	ttm_set_driver_manager(bdev, TTM_PL_SYSTEM, NULL);
> -
>   	mutex_lock(&ttm_global_mutex);
>   	list_del(&bdev->device_list);
>   	mutex_unlock(&ttm_global_mutex);
> @@ -238,6 +234,10 @@ void ttm_device_fini(struct ttm_device *bdev)
>   	if (ttm_bo_delayed_delete(bdev, true))
>   		pr_debug("Delayed destroy list was clean\n");
>   
> +	man = ttm_manager_type(bdev, TTM_PL_SYSTEM);
> +	ttm_resource_manager_set_used(man, false);
> +	ttm_set_driver_manager(bdev, TTM_PL_SYSTEM, NULL);
> +
>   	spin_lock(&bdev->lru_lock);
>   	for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i)
>   		if (list_empty(&man->lru[0]))



More information about the dri-devel mailing list