[PATCH v2] drm/amdgpu: Make sure ttm delayed work finished

Paul Menzel pmenzel at molgen.mpg.de
Wed Apr 13 07:34:10 UTC 2022


Dear xinhui,


Am 13.04.22 um 08:46 schrieb xinhui pan:
> ttm_device_delayed_workqueue would reschedule itself if there is pending
> BO to be destroyed. So just one flush + cancel_sync is not enough. We
> still see lru_list not empty warnging.

warning

(`scripts/checkpatch.pl --codespell` should have found this.)

> 
> Fix it by waiting all BO to be destroyed.

Please explain the waiting algorithm. Why at least one millisecond?

Do you have a reproducer for this issue?

> Acked-by: Guchun Chen <guchun.chen at amd.com>
> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 6f47726f1765..56dcf8c3a3cd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3957,11 +3957,17 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
>    */
>   void amdgpu_device_fini_hw(struct amdgpu_device *adev)
>   {
> +	int pending = 1;

Could this be bool?

> +
>   	dev_info(adev->dev, "amdgpu: finishing device.\n");
>   	flush_delayed_work(&adev->delayed_init_work);
> -	if (adev->mman.initialized) {
> +	while (adev->mman.initialized && pending) {
>   		flush_delayed_work(&adev->mman.bdev.wq);
> -		ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
> +		pending = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
> +		if (pending) {
> +			ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);

Does this call affect `adev->mman.initialized`? If not a do-while loop 
might be better suited, so pending is only checked once.

if (adev->mman.initialized) {
	do {
		flush_delayed_work(&adev->mman.bdev.wq);
		ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);
		msleep(((HZ / 100) < 1) ? 1 : HZ / 100);
	} while (ttm_bo_lock_delayed_workqueue(&adev->mman.bdev));
}

The logic is slightly different, as 
`ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);` and the sleep 
are run at least once, so the suggestion might be unsuitable.

> +			msleep(((HZ / 100) < 1) ? 1 : HZ / 100);

Maybe add a comment for that formula? (No idea, if common knowledge, but 
why is a delay needed, and the loop cannot be run as fast as possible?)

> +		}
>   	}
>   	adev->shutdown = true;
>   


Kind regards,

Paul


More information about the amd-gfx mailing list