[PATCH] drm/amd/amdgpu: skip locking delayed work if not initialized.

Wang, YuBiao YuBiao.Wang at amd.com
Fri Aug 6 06:01:16 UTC 2021


[AMD Official Use Only]

Hi Christian,

This part is added by a commit which stated that:
When unloading driver after killing some applications, it will hit sdma flush tlb job timeout which is called by ttm_bo_delay_delete. So to avoid the job submit after fence driver fini, call ttm_bo_lock_delayed_workqueue
before fence driver fini. And also put drm_sched_fini before waiting fence.

As fence driver fini is before amdgpu ip fini process, so I think I shouldn't move it into ttm_fini.

Best Regards,
Yubiao Wang



-----Original Message-----
From: Christian König <ckoenig.leichtzumerken at gmail.com> 
Sent: Thursday, August 5, 2021 8:36 PM
To: Wang, YuBiao <YuBiao.Wang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>; Quan, Evan <Evan.Quan at amd.com>; Chen, Horace <Horace.Chen at amd.com>; Tuikov, Luben <Luben.Tuikov at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Xiao, Jack <Jack.Xiao at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>; Liu, Monk <Monk.Liu at amd.com>; Xu, Feifei <Feifei.Xu at amd.com>; Wang, Kevin(Yang) <Kevin1.Wang at amd.com>
Subject: Re: [PATCH] drm/amd/amdgpu: skip locking delayed work if not initialized.

Am 05.08.21 um 04:37 schrieb YuBiao Wang:
> When init failed in early init stage, amdgpu_object has not been 
> initialized, so hasn't the ttm delayed queue functions.
>
> Signed-off-by: YuBiao Wang <YuBiao.Wang at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 9e53ff851496..4c33985542ed 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3825,7 +3825,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
>   {
>   	dev_info(adev->dev, "amdgpu: finishing device.\n");
>   	flush_delayed_work(&adev->delayed_init_work);
> -	ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
> +	if (adev->mman.initialized)
> +		ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);

I'm really wondering why we have that here in the first place.

This just disabled the delayed delete queue which is part of the sw stack and not related to hardware in any way possible.

I think it would be much cleaner to move this into amdgpu_ttm_fini().

Christian.

>   	adev->shutdown = true;
>   
>   	/* make sure IB test finished before entering exclusive mode


More information about the amd-gfx mailing list