Re: 回复: [PATCH] drm/amdgpu: Make sure ttm delayed work finished

Christian König christian.koenig at amd.com
Wed Apr 13 08:14:56 UTC 2022


That warning is a bit more than a little annoying.

Before we stop the delayed delete worker we *must* absolutely make sure 
that there is nothing going on the hardware any more. Otherwise we could 
easily run into use after free issues.

There should somewhere be a amdgpu_fence_wait_empty() before the 
flush_delayed_work() call. If that isn't there we do have a problem 
elsewhere.

Thanks for investigating this,
Christian.

Am 13.04.22 um 09:47 schrieb Pan, Xinhui:
> [AMD Official Use Only]
>
> The log from tester says it is the drm framebuffer BO being busy.
>
> I just feel there is lack of time for its fence to be signaled.
> As a delay works too in my test.
> But the warning is a little annoying.
>
> ________________________________________
> 发件人: Koenig, Christian <Christian.Koenig at amd.com>
> 发送时间: 2022年4月13日 15:30
> 收件人: Pan, Xinhui; amd-gfx at lists.freedesktop.org
> 抄送: Deucher, Alexander
> 主题: AW: [PATCH] drm/amdgpu: Make sure ttm delayed work finished
>
> We don't need that.
>
> TTM only reschedules when the BOs are still busy.
>
> And if the BOs are still busy when you unload the driver we have much bigger problems that this TTM worker :)
>
> Regards,
> Christian
>
> ________________________________
> Von: Pan, Xinhui <Xinhui.Pan at amd.com>
> Gesendet: Mittwoch, 13. April 2022 05:08
> An: amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>
> Betreff: [PATCH] drm/amdgpu: Make sure ttm delayed work finished
>
> ttm_device_delayed_workqueue would reschedule itself if there is pending
> BO to be destroyed. So just one flush + cancel_sync is not enough. We
> still see lru_list not empty warnging.
>
> Fix it by waiting all BO to be destroyed.
>
> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 6f47726f1765..e249923eb9a7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3957,11 +3957,17 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
>    */
>   void amdgpu_device_fini_hw(struct amdgpu_device *adev)
>   {
> +       int pending = 1;
> +
>           dev_info(adev->dev, "amdgpu: finishing device.\n");
>           flush_delayed_work(&adev->delayed_init_work);
> -       if (adev->mman.initialized) {
> +       while (adev->mman.initialized && pending) {
>                   flush_delayed_work(&adev->mman.bdev.wq);
> -               ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
> +               pending = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
> +               if (pending) {
> +                       ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);
> +                       msleep((HZ / 100) < 1) ? 1 : HZ / 100);
> +               }
>           }
>           adev->shutdown = true;
>
> --
> 2.25.1
>



More information about the amd-gfx mailing list