[PATCH v2] drm/amdgpu: Make sure ttm delayed work finished
Paul Menzel
pmenzel at molgen.mpg.de
Wed Apr 13 07:34:10 UTC 2022
Dear xinhui,
Am 13.04.22 um 08:46 schrieb xinhui pan:
> ttm_device_delayed_workqueue would reschedule itself if there is pending
> BO to be destroyed. So just one flush + cancel_sync is not enough. We
> still see lru_list not empty warnging.
warning
(`scripts/checkpatch.pl --codespell` should have found this.)
>
> Fix it by waiting all BO to be destroyed.
Please explain the waiting algorithm. Why at least one millisecond?
Do you have a reproducer for this issue?
> Acked-by: Guchun Chen <guchun.chen at amd.com>
> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 6f47726f1765..56dcf8c3a3cd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3957,11 +3957,17 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
> */
> void amdgpu_device_fini_hw(struct amdgpu_device *adev)
> {
> + int pending = 1;
Could this be bool?
> +
> dev_info(adev->dev, "amdgpu: finishing device.\n");
> flush_delayed_work(&adev->delayed_init_work);
> - if (adev->mman.initialized) {
> + while (adev->mman.initialized && pending) {
> flush_delayed_work(&adev->mman.bdev.wq);
> - ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
> + pending = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
> + if (pending) {
> + ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);
Does this call affect `adev->mman.initialized`? If not a do-while loop
might be better suited, so pending is only checked once.
if (adev->mman.initialized) {
do {
flush_delayed_work(&adev->mman.bdev.wq);
ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);
msleep(((HZ / 100) < 1) ? 1 : HZ / 100);
} while (ttm_bo_lock_delayed_workqueue(&adev->mman.bdev));
}
The logic is slightly different, as
`ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);` and the sleep
are run at least once, so the suggestion might be unsuitable.
> + msleep(((HZ / 100) < 1) ? 1 : HZ / 100);
Maybe add a comment for that formula? (No idea, if common knowledge, but
why is a delay needed, and the loop cannot be run as fast as possible?)
> + }
> }
> adev->shutdown = true;
>
Kind regards,
Paul
More information about the amd-gfx
mailing list