[PATCH] drm/amdgpu: Fix some unload driver issues

Deng, Emily Emily.Deng at amd.com
Fri Mar 5 08:52:51 UTC 2021


[AMD Official Use Only - Internal Distribution Only]

>-----Original Message-----
>From: Christian König <ckoenig.leichtzumerken at gmail.com>
>Sent: Friday, March 5, 2021 4:52 PM
>To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix some unload driver issues
>
>
>
>Am 05.03.21 um 09:43 schrieb Deng, Emily:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>> -----Original Message-----
>>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>>> Sent: Friday, March 5, 2021 3:55 PM
>>> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: Fix some unload driver issues
>>>
>>> Am 05.03.21 um 02:20 schrieb Emily Deng:
>>>> When unloading driver after killing some applications, it will hit
>>>> sdma flush tlb job timeout which is called by ttm_bo_delay_delete.
>>>> So to avoid the job submit after fence driver fini, call
>>>> ttm_bo_lock_delayed_workqueue before fence driver fini. And also put
>>> drm_sched_fini before waiting fence.
>>>
>>> Good catch, Reviewed-by: Christian König <christian.koenig at amd.com>
>>> for this part.
>>>
>>>> Set adev->gart.ptr to null to fix null pointer when calling
>>>> amdgpu_gart_unbind in amdgpu_bo_fini which is after gart_fini.
>>> For that one I'm wondering if we shouldn't rather rework or double
>>> check the tear down order.
>>>
>>> On the other hand the hardware should be idle by now (this comes
>>> after the fence_driver_fini, doesn't it?) and it looks cleaner on it's own.
>> Yes, you are right, without consider memory leak, with above patch, the bo
>are all cleaned, then won't have issue. But if have memory leak, maybe it will
>have issue in ttm_bo_force_list_clean-> ttm_mem_evict_first?
>
>Yeah, that is a good argument and part of what I mean with it looks cleaner on
>its own.
>
>Maybe write that into the commit message like this. With that done the full
>patch has my rb.
>
>Regards,
>Christian.
Ok, thanks.
>
>>
>>> Regards,
>>> Christian.
>>>
>>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 5 +++--
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 1 +
>>>>    3 files changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index a11760ec3924..de0597d34588 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -3594,6 +3594,7 @@ void amdgpu_device_fini(struct amdgpu_device
>>> *adev)
>>>>    {
>>>>    dev_info(adev->dev, "amdgpu: finishing device.\n");
>>>>    flush_delayed_work(&adev->delayed_init_work);
>>>> +ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
>>>>    adev->shutdown = true;
>>>>
>>>>    kfree(adev->pci_state);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> index 143a14f4866f..6d16f58ac91e 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> @@ -531,6 +531,8 @@ void amdgpu_fence_driver_fini(struct
>>> amdgpu_device
>>>> *adev)
>>>>
>>>>    if (!ring || !ring->fence_drv.initialized)
>>>>    continue;
>>>> +if (!ring->no_scheduler)
>>>> +drm_sched_fini(&ring->sched);
>>>>    r = amdgpu_fence_wait_empty(ring);
>>>>    if (r) {
>>>>    /* no need to trigger GPU reset as we are unloading */
>>> @@ -539,8
>>>> +541,7 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>>>>    if (ring->fence_drv.irq_src)
>>>>    amdgpu_irq_put(adev, ring->fence_drv.irq_src,
>>>>           ring->fence_drv.irq_type); -if (!ring->no_scheduler)
>>>> -drm_sched_fini(&ring->sched);
>>>> +
>>>>    del_timer_sync(&ring->fence_drv.fallback_timer);
>>>>    for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
>>>>    dma_fence_put(ring->fence_drv.fences[j]);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>>> index 23823a57374f..f1ede4b43d07 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>>> @@ -202,6 +202,7 @@ void amdgpu_gart_table_vram_free(struct
>>> amdgpu_device *adev)
>>>>    return;
>>>>    }
>>>>    amdgpu_bo_unref(&adev->gart.bo);
>>>> +adev->gart.ptr = NULL;
>>>>    }
>>>>
>>>>    /*



More information about the amd-gfx mailing list