[PATCH] drm/amdgpu: Reorder ttm_resource_manager_evict_all() before disabling ttm resource manager

Christian König christian.koenig at amd.com
Mon Mar 28 09:00:40 UTC 2022


Am 28.03.22 um 10:47 schrieb Leslie Shi:
> ttm_resource_manager_evict_all() evicts objects out of resource manager
> until lru is empty. ttm_resource_manager_set_used() WARN_ON non-empty lru.
> This patch exchanges these two function calls to avoid following call trace
> during amdgpu driver unload:

Well absolutely NAK.

This is an intentional warning that _fini was called while there are 
still allocations inside the domain.

The evict all is just the last resort to not hard crash in this moment.

Regards,
Christian.

>
> WARNING: CPU: 6 PID: 22873 at
> include/drm/ttm/ttm_resource.h:229 amdgpu_vram_mgr_fini+0x145/0x160 [amdgpu]
>
> Call Trace:
>      amdgpu_ttm_fini+0x2c2/0x370 [amdgpu]
>      amdgpu_bo_fini+0x25/0x90 [amdgpu]
>      gmc_v10_0_sw_fini+0x2b/0x40 [amdgpu]
>      amdgpu_device_fini_sw+0xd2/0x370 [amdgpu]
>      amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
>      drm_dev_release+0x28/0x40 [drm]
>      devm_drm_dev_init_release+0x30/0x50 [drm]
>      devm_action_release+0x15/0x20
>      release_nodes+0x19a/0x1e0
>      devres_release_all+0x3f/0x50
>      device_release_driver_internal+0x11e/0x1e0
>      driver_detach+0x4c/0x90
>      bus_remove_driver+0x5c/0xd0
>      driver_unregister+0x31/0x50
>      pci_unregister_driver+0x40/0x90
>      amdgpu_exit+0x15/0x12a [amdgpu]
>
> Signed-off-by: Leslie Shi <Yuliang.Shi at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c     | 4 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_preempt_mgr.c | 4 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c    | 4 ++--
>   3 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> index c5263908caec..e472a0d639fa 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> @@ -308,12 +308,12 @@ void amdgpu_gtt_mgr_fini(struct amdgpu_device *adev)
>   	struct ttm_resource_manager *man = &mgr->manager;
>   	int ret;
>   
> -	ttm_resource_manager_set_used(man, false);
> -
>   	ret = ttm_resource_manager_evict_all(&adev->mman.bdev, man);
>   	if (ret)
>   		return;
>   
> +	ttm_resource_manager_set_used(man, false);
> +
>   	spin_lock(&mgr->lock);
>   	drm_mm_takedown(&mgr->mm);
>   	spin_unlock(&mgr->lock);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_preempt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_preempt_mgr.c
> index 786afe4f58f9..798be117c3bb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_preempt_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_preempt_mgr.c
> @@ -182,12 +182,12 @@ void amdgpu_preempt_mgr_fini(struct amdgpu_device *adev)
>   	struct ttm_resource_manager *man = &mgr->manager;
>   	int ret;
>   
> -	ttm_resource_manager_set_used(man, false);
> -
>   	ret = ttm_resource_manager_evict_all(&adev->mman.bdev, man);
>   	if (ret)
>   		return;
>   
> +	ttm_resource_manager_set_used(man, false);
> +
>   	device_remove_file(adev->dev, &dev_attr_mem_info_preempt_used);
>   
>   	ttm_resource_manager_cleanup(man);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 6c99ef700cc8..f94f2b271544 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -718,12 +718,12 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev)
>   	int ret;
>   	struct amdgpu_vram_reservation *rsv, *temp;
>   
> -	ttm_resource_manager_set_used(man, false);
> -
>   	ret = ttm_resource_manager_evict_all(&adev->mman.bdev, man);
>   	if (ret)
>   		return;
>   
> +	ttm_resource_manager_set_used(man, false);
> +
>   	spin_lock(&mgr->lock);
>   	list_for_each_entry_safe(rsv, temp, &mgr->reservations_pending, node)
>   		kfree(rsv);



More information about the amd-gfx mailing list