[PATCH] drm/ttm: Drain workqueue before sys manager release
Karolina Stolarek
karolina.stolarek at intel.com
Mon Oct 16 11:31:51 UTC 2023
On 16.10.2023 13:03, Christian König wrote:
> Am 13.10.23 um 16:34 schrieb Karolina Stolarek:
>> In rare cases, a delayed destruction of a BO with a system resource
>> could stay in the workqueue until drain_workqueue() is called in
>> ttm_device_fini(). An attempt to free a resource from an already
>> released manager results in NULL pointer dereference. Move the step
>> of draining and destroying the workqueue so it happens before the
>> ttm_sys_manager cleanup.
>
> Good catch! But it's not only the drainage of the workqueue which is
> at the wrong place here.
>
> Removing the device from the device list should come before
> destroying the system domain as well.
I wonder when this would cause problems. But of course, it makes sense
to fix both with one patch.
> So I think you should probably rather move the chunk:
>
> man = ttm_manager_type(bdev, TTM_PL_SYSTEM);
> ttm_resource_manager_set_used(man, false);
> ttm_set_driver_manager(bdev, TTM_PL_SYSTEM, NULL);
>
> after the destroy_workqueue() instead of the other way around.>
> Apart from that looks good to me.
Thanks for taking a look and the suggestion, I'll send a v2 once I
finish testing the change. Also, I'll drop the Fixes tag, as it seems
the order issue has been there since the very beginning.
All the best,
Karolina
>
> Thanks, Christian.
>
>>
>> Fixes: 9bff18d13473 ("drm/ttm: use per BO cleanup workers")
>> Signed-off-by: Karolina Stolarek <karolina.stolarek at intel.com> ---
>> Some background: I stumbled upon this issue when testing
>> ttm_bo_pipeline_gutting() with BO with an active dma_resv fence.
>> In ~2% of the runs, the delayed destruction of the ghost wouldn't
>> happen until the drain_queue() step. man->func->free(man, *res)
>> got called via ttm_bo_cleanup_memtype_use(), the manager and its
>> functions were nowhere to be seen, resulting in a nulptr deref.
>>
>> drivers/gpu/drm/ttm/ttm_device.c | 6 +++--- 1 file changed, 3
>> insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c
>> b/drivers/gpu/drm/ttm/ttm_device.c index
>> 7726a72befc5..753126581620 100644 ---
>> a/drivers/gpu/drm/ttm/ttm_device.c +++
>> b/drivers/gpu/drm/ttm/ttm_device.c @@ -232,6 +232,9 @@ void
>> ttm_device_fini(struct ttm_device *bdev) struct
>> ttm_resource_manager *man; unsigned i; + drain_workqueue(bdev->wq);
>> + destroy_workqueue(bdev->wq); + man = ttm_manager_type(bdev,
>> TTM_PL_SYSTEM); ttm_resource_manager_set_used(man, false);
>> ttm_set_driver_manager(bdev, TTM_PL_SYSTEM, NULL); @@ -240,9
>> +243,6 @@ void ttm_device_fini(struct ttm_device *bdev)
>> list_del(&bdev->device_list); mutex_unlock(&ttm_global_mutex); -
>> drain_workqueue(bdev->wq); - destroy_workqueue(bdev->wq); -
>> spin_lock(&bdev->lru_lock); for (i = 0; i < TTM_MAX_BO_PRIORITY;
>> ++i) if (list_empty(&man->lru[0]))
>
More information about the dri-devel
mailing list