[PATCH] drm/amdgpu: Reorder ttm_resource_manager_evict_all() before disabling ttm resource manager
Christian König
christian.koenig at amd.com
Mon Mar 28 09:00:40 UTC 2022
Am 28.03.22 um 10:47 schrieb Leslie Shi:
> ttm_resource_manager_evict_all() evicts objects out of resource manager
> until lru is empty. ttm_resource_manager_set_used() WARN_ON non-empty lru.
> This patch exchanges these two function calls to avoid following call trace
> during amdgpu driver unload:
Well absolutely NAK.
This is an intentional warning that _fini was called while there are
still allocations inside the domain.
The evict all is just the last resort to not hard crash in this moment.
Regards,
Christian.
>
> WARNING: CPU: 6 PID: 22873 at
> include/drm/ttm/ttm_resource.h:229 amdgpu_vram_mgr_fini+0x145/0x160 [amdgpu]
>
> Call Trace:
> amdgpu_ttm_fini+0x2c2/0x370 [amdgpu]
> amdgpu_bo_fini+0x25/0x90 [amdgpu]
> gmc_v10_0_sw_fini+0x2b/0x40 [amdgpu]
> amdgpu_device_fini_sw+0xd2/0x370 [amdgpu]
> amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
> drm_dev_release+0x28/0x40 [drm]
> devm_drm_dev_init_release+0x30/0x50 [drm]
> devm_action_release+0x15/0x20
> release_nodes+0x19a/0x1e0
> devres_release_all+0x3f/0x50
> device_release_driver_internal+0x11e/0x1e0
> driver_detach+0x4c/0x90
> bus_remove_driver+0x5c/0xd0
> driver_unregister+0x31/0x50
> pci_unregister_driver+0x40/0x90
> amdgpu_exit+0x15/0x12a [amdgpu]
>
> Signed-off-by: Leslie Shi <Yuliang.Shi at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 4 ++--
> drivers/gpu/drm/amd/amdgpu/amdgpu_preempt_mgr.c | 4 ++--
> drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 4 ++--
> 3 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> index c5263908caec..e472a0d639fa 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> @@ -308,12 +308,12 @@ void amdgpu_gtt_mgr_fini(struct amdgpu_device *adev)
> struct ttm_resource_manager *man = &mgr->manager;
> int ret;
>
> - ttm_resource_manager_set_used(man, false);
> -
> ret = ttm_resource_manager_evict_all(&adev->mman.bdev, man);
> if (ret)
> return;
>
> + ttm_resource_manager_set_used(man, false);
> +
> spin_lock(&mgr->lock);
> drm_mm_takedown(&mgr->mm);
> spin_unlock(&mgr->lock);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_preempt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_preempt_mgr.c
> index 786afe4f58f9..798be117c3bb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_preempt_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_preempt_mgr.c
> @@ -182,12 +182,12 @@ void amdgpu_preempt_mgr_fini(struct amdgpu_device *adev)
> struct ttm_resource_manager *man = &mgr->manager;
> int ret;
>
> - ttm_resource_manager_set_used(man, false);
> -
> ret = ttm_resource_manager_evict_all(&adev->mman.bdev, man);
> if (ret)
> return;
>
> + ttm_resource_manager_set_used(man, false);
> +
> device_remove_file(adev->dev, &dev_attr_mem_info_preempt_used);
>
> ttm_resource_manager_cleanup(man);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 6c99ef700cc8..f94f2b271544 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -718,12 +718,12 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev)
> int ret;
> struct amdgpu_vram_reservation *rsv, *temp;
>
> - ttm_resource_manager_set_used(man, false);
> -
> ret = ttm_resource_manager_evict_all(&adev->mman.bdev, man);
> if (ret)
> return;
>
> + ttm_resource_manager_set_used(man, false);
> +
> spin_lock(&mgr->lock);
> list_for_each_entry_safe(rsv, temp, &mgr->reservations_pending, node)
> kfree(rsv);
More information about the amd-gfx
mailing list