[PATCH] drm/ttm: Don't delete the system manager before the delayed delete
Christian König
christian.koenig at amd.com
Mon Sep 20 06:30:46 UTC 2021
Am 17.09.21 um 19:53 schrieb Zack Rusin:
> On some hardware, in particular in virtualized environments, the
> system memory can be shared with the "hardware". In those cases
> the BO's allocated through the ttm system manager might be
> busy during ttm_bo_put which results in them being scheduled
> for a delayed deletion.
While the patch itself is probably fine the reasoning here is a clear NAK.
Buffers in the system domain are not GPU accessible by definition, even
in a shared environment and so *must* be idle.
Otherwise you break quite a number of assumptions in the code.
Regards,
Christian.
>
> The problem is that that the ttm system manager is disabled
> before the final delayed deletion is ran in ttm_device_fini.
> This results in crashes during freeing of the BO resources
> because they're trying to remove themselves from a no longer
> existent ttm_resource_manager (e.g. in IGT's core_hotunplug
> on vmwgfx)
>
> In general reloading any driver that could share system mem
> resources with "hardware" could hit it because nothing
> prevents the system mem resources from being scheduled
> for delayed deletion (apart from them not being busy probably
> anywhere apart from virtualized environments).
>
> Signed-off-by: Zack Rusin <zackr at vmware.com>
> Cc: Christian Koenig <christian.koenig at amd.com>
> Cc: Huang Rui <ray.huang at amd.com>
> Cc: David Airlie <airlied at linux.ie>
> Cc: Daniel Vetter <daniel at ffwll.ch>
> Cc: dri-devel at lists.freedesktop.org
> ---
> drivers/gpu/drm/ttm/ttm_device.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index 9eb8f54b66fc..4ef19cafc755 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -225,10 +225,6 @@ void ttm_device_fini(struct ttm_device *bdev)
> struct ttm_resource_manager *man;
> unsigned i;
>
> - man = ttm_manager_type(bdev, TTM_PL_SYSTEM);
> - ttm_resource_manager_set_used(man, false);
> - ttm_set_driver_manager(bdev, TTM_PL_SYSTEM, NULL);
> -
> mutex_lock(&ttm_global_mutex);
> list_del(&bdev->device_list);
> mutex_unlock(&ttm_global_mutex);
> @@ -238,6 +234,10 @@ void ttm_device_fini(struct ttm_device *bdev)
> if (ttm_bo_delayed_delete(bdev, true))
> pr_debug("Delayed destroy list was clean\n");
>
> + man = ttm_manager_type(bdev, TTM_PL_SYSTEM);
> + ttm_resource_manager_set_used(man, false);
> + ttm_set_driver_manager(bdev, TTM_PL_SYSTEM, NULL);
> +
> spin_lock(&bdev->lru_lock);
> for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i)
> if (list_empty(&man->lru[0]))
More information about the dri-devel
mailing list