[PATCH 6/6] drm/amdgpu: Fix driver unload issue

Christian König ckoenig.leichtzumerken at gmail.com
Tue Mar 30 08:37:56 UTC 2021


Hi Emily,

as I said add a WARN_ON() and look at the backtrace.

It could be that the backtrace then just shows the general cleanup 
functions, but it is at least a start.

On the other hand if you only see this sometimes then we have some kind 
of race condition and need to dig deeper.

Christian.

Am 30.03.21 um 10:19 schrieb Deng, Emily:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi Christian,
>       Yes, I agree both with you. But the issue occurs randomly and in unload driver and in fairly low rate. It is hard to debug where is the memory leak. Could you give some suggestion about how
> to debug this issue?
>
>
> Best wishes
> Emily Deng
>
>
>
>> -----Original Message-----
>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>> Sent: Tuesday, March 30, 2021 3:11 PM
>> To: Deng, Emily <Emily.Deng at amd.com>; Chen, Jiansong (Simon)
>> <Jiansong.Chen at amd.com>; amd-gfx at lists.freedesktop.org
>> Subject: Re: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>
>> Good morning,
>>
>> yes Jiansong is right that patch is really not a good idea.
>>
>> Moving buffers can indeed happen during shutdown while some memory is
>> still referenced.
>>
>> Just ignoring the move is not the right approach, you need to find out why the
>> memory is moved in the first place.
>>
>> You could add something like WARN_ON(adev->shutdown);
>>
>> Regards,
>> Christian.
>>
>> Am 30.03.21 um 09:05 schrieb Deng, Emily:
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
>>> Hi Jiansong,
>>>        It does happen,  maybe have the race condition?
>>>
>>>
>>> Best wishes
>>> Emily Deng
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Chen, Jiansong (Simon) <Jiansong.Chen at amd.com>
>>>> Sent: Tuesday, March 30, 2021 2:49 PM
>>>> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org
>>>> Cc: Deng, Emily <Emily.Deng at amd.com>
>>>> Subject: RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>>
>>>> [AMD Official Use Only - Internal Distribution Only]
>>>>
>>>> I still wonder how the issue takes place? According to my humble
>>>> knowledge in driver model, the reference count of the kobject for the
>>>> device will not reach zero when there is still some device mem
>>>> access, and shutdown should not happen.
>>>>
>>>> Regards,
>>>> Jiansong
>>>> -----Original Message-----
>>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>>>> Emily Deng
>>>> Sent: Tuesday, March 30, 2021 12:42 PM
>>>> To: amd-gfx at lists.freedesktop.org
>>>> Cc: Deng, Emily <Emily.Deng at amd.com>
>>>> Subject: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>>
>>>> During driver unloading, don't need to copy mem, or it will introduce
>>>> some call trace, such as when sa_manager is freed, it will introduce
>>>> warn call trace in amdgpu_sa_bo_new.
>>>>
>>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> index e00263bcc88b..f0546a489e0d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> @@ -317,6 +317,9 @@ int amdgpu_ttm_copy_mem_to_mem(struct
>>>> amdgpu_device *adev,  struct dma_fence *fence = NULL;  int r = 0;
>>>>
>>>> +if (adev->shutdown)
>>>> +return 0;
>>>> +
>>>> if (!adev->mman.buffer_funcs_enabled) {  DRM_ERROR("Trying to move
>>>> memory with ring turned off.\n");  return -EINVAL;
>>>> --
>>>> 2.25.1
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
>>>> ts.fr
>>>> eedesktop.org%2Fmailman%2Flistinfo%2Famd-
>>>>
>> gfx&data=04%7C01%7CJiansong.Chen%40amd.com%7C1b4c71d7b96247
>> 6a367508d8f3362f40%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>> C637526761354532311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdat
>> a=RxRnZW0fmwjKSGMN1nf6kIHRdAPVs9J5OBluDYhR6vQ%3D&reserved
>>>> =0
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
>>> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-
>> gfx&data=04%7C01%7CEm
>> ily.Deng%40amd.com%7Cffacb4715aff4ba4336808d8f34af62d%7C3dd8961fe4
>> 884e
>> 608e11a82d994e183d%7C0%7C0%7C637526850578585302%7CUnknown%7CT
>> WFpbGZsb3
>> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%
>> 3D%7
>> C1000&sdata=u26JPASmJOF5nkXFSJP89PiUUFehvzf%2B2qxQM%2FgT9Ek
>> %3D&amp
>>> ;reserved=0



More information about the amd-gfx mailing list