[PATCH 6/6] drm/amdgpu: Fix driver unload issue

Deng, Emily Emily.Deng at amd.com
Tue Mar 30 09:11:50 UTC 2021


[AMD Official Use Only - Internal Distribution Only]

Hi Christian,
     Ok, will investigate this more for memory leak. But even I fixed this memory leak this time, it couldn't promise anymore memory leak in future. Memory leak shouldn't cause kernel crush, and couldn't
be used anymore.

Best wishes
Emily Deng



>-----Original Message-----
>From: Christian König <ckoenig.leichtzumerken at gmail.com>
>Sent: Tuesday, March 30, 2021 4:38 PM
>To: Deng, Emily <Emily.Deng at amd.com>; Chen, Jiansong (Simon)
><Jiansong.Chen at amd.com>; amd-gfx at lists.freedesktop.org
>Subject: Re: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>
>Hi Emily,
>
>as I said add a WARN_ON() and look at the backtrace.
>
>It could be that the backtrace then just shows the general cleanup functions,
>but it is at least a start.
>
>On the other hand if you only see this sometimes then we have some kind of
>race condition and need to dig deeper.
>
>Christian.
>
>Am 30.03.21 um 10:19 schrieb Deng, Emily:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Hi Christian,
>>       Yes, I agree both with you. But the issue occurs randomly and in
>> unload driver and in fairly low rate. It is hard to debug where is the memory
>leak. Could you give some suggestion about how to debug this issue?
>>
>>
>> Best wishes
>> Emily Deng
>>
>>
>>
>>> -----Original Message-----
>>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>>> Sent: Tuesday, March 30, 2021 3:11 PM
>>> To: Deng, Emily <Emily.Deng at amd.com>; Chen, Jiansong (Simon)
>>> <Jiansong.Chen at amd.com>; amd-gfx at lists.freedesktop.org
>>> Subject: Re: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>
>>> Good morning,
>>>
>>> yes Jiansong is right that patch is really not a good idea.
>>>
>>> Moving buffers can indeed happen during shutdown while some memory
>is
>>> still referenced.
>>>
>>> Just ignoring the move is not the right approach, you need to find
>>> out why the memory is moved in the first place.
>>>
>>> You could add something like WARN_ON(adev->shutdown);
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 30.03.21 um 09:05 schrieb Deng, Emily:
>>>> [AMD Official Use Only - Internal Distribution Only]
>>>>
>>>> Hi Jiansong,
>>>>        It does happen,  maybe have the race condition?
>>>>
>>>>
>>>> Best wishes
>>>> Emily Deng
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Chen, Jiansong (Simon) <Jiansong.Chen at amd.com>
>>>>> Sent: Tuesday, March 30, 2021 2:49 PM
>>>>> To: Deng, Emily <Emily.Deng at amd.com>; amd-
>gfx at lists.freedesktop.org
>>>>> Cc: Deng, Emily <Emily.Deng at amd.com>
>>>>> Subject: RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>>>
>>>>> [AMD Official Use Only - Internal Distribution Only]
>>>>>
>>>>> I still wonder how the issue takes place? According to my humble
>>>>> knowledge in driver model, the reference count of the kobject for
>>>>> the device will not reach zero when there is still some device mem
>>>>> access, and shutdown should not happen.
>>>>>
>>>>> Regards,
>>>>> Jiansong
>>>>> -----Original Message-----
>>>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>>>>> Emily Deng
>>>>> Sent: Tuesday, March 30, 2021 12:42 PM
>>>>> To: amd-gfx at lists.freedesktop.org
>>>>> Cc: Deng, Emily <Emily.Deng at amd.com>
>>>>> Subject: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>>>
>>>>> During driver unloading, don't need to copy mem, or it will
>>>>> introduce some call trace, such as when sa_manager is freed, it
>>>>> will introduce warn call trace in amdgpu_sa_bo_new.
>>>>>
>>>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
>>>>> 1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> index e00263bcc88b..f0546a489e0d 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> @@ -317,6 +317,9 @@ int amdgpu_ttm_copy_mem_to_mem(struct
>>>>> amdgpu_device *adev,  struct dma_fence *fence = NULL;  int r = 0;
>>>>>
>>>>> +if (adev->shutdown)
>>>>> +return 0;
>>>>> +
>>>>> if (!adev->mman.buffer_funcs_enabled) {  DRM_ERROR("Trying to move
>>>>> memory with ring turned off.\n");  return -EINVAL;
>>>>> --
>>>>> 2.25.1
>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx at lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fl
>>>>> is
>>>>> ts.fr
>>>>> eedesktop.org%2Fmailman%2Flistinfo%2Famd-
>>>>>
>>>
>gfx&data=04%7C01%7CJiansong.Chen%40amd.com%7C1b4c71d7b96247
>>>
>6a367508d8f3362f40%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>>>
>C637526761354532311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>>>
>MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdat
>>>
>a=RxRnZW0fmwjKSGMN1nf6kIHRdAPVs9J5OBluDYhR6vQ%3D&reserved
>>>>> =0
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
>>>> st
>>>> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-
>>> gfx&data=04%7C01%7CEm
>>>
>ily.Deng%40amd.com%7Cffacb4715aff4ba4336808d8f34af62d%7C3dd8961fe4
>>> 884e
>>>
>608e11a82d994e183d%7C0%7C0%7C637526850578585302%7CUnknown%7CT
>>> WFpbGZsb3
>>>
>d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%
>>> 3D%7
>>>
>C1000&sdata=u26JPASmJOF5nkXFSJP89PiUUFehvzf%2B2qxQM%2FgT9Ek
>>> %3D&amp
>>>> ;reserved=0



More information about the amd-gfx mailing list