FW: [PATCH 1/3] drm/amdgpu: fix a potential memory leak in amdgpu_device_fini_sw()

Yu, Lang Lang.Yu at amd.com
Thu Oct 21 23:33:26 UTC 2021


[AMD Official Use Only]



>-----Original Message-----
>From: Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>
>Sent: Thursday, October 21, 2021 11:18 PM
>To: Yu, Lang <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
>Subject: Re: FW: [PATCH 1/3] drm/amdgpu: fix a potential memory leak in
>amdgpu_device_fini_sw()
>
>On 2021-10-21 3:19 a.m., Yu, Lang wrote:
>
>> [AMD Official Use Only]
>>
>>
>>
>>> -----Original Message-----
>>> From: Yu, Lang <Lang.Yu at amd.com>
>>> Sent: Thursday, October 21, 2021 3:18 PM
>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>
>>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Koenig, Christian
>>> <Christian.Koenig at amd.com>; Huang, Ray <Ray.Huang at amd.com>; Yu, Lang
>>> <Lang.Yu at amd.com>
>>> Subject: [PATCH 1/3] drm/amdgpu: fix a potential memory leak in
>>> amdgpu_device_fini_sw()
>>>
>>> amdgpu_fence_driver_sw_fini() should be executed before
>>> amdgpu_device_ip_fini(), otherwise fence driver resource won't be
>>> properly freed as adev->rings have been tore down.
>
>
>Cam you clarify more where exactly the memleak happens ?
>
>Andrey

See amdgpu_fence_driver_sw_fini(), ring->fence_drv.fences will only be freed
when adev->rings[i] is not NULL.

void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev)
{
	unsigned int i, j;

	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
		struct amdgpu_ring *ring = adev->rings[i];

		if (!ring || !ring->fence_drv.initialized)
			continue;

		if (!ring->no_scheduler)
			drm_sched_fini(&ring->sched);

		for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
			dma_fence_put(ring->fence_drv.fences[j]);
		kfree(ring->fence_drv.fences);
		ring->fence_drv.fences = NULL;
		ring->fence_drv.initialized = false;
	}
}

If amdgpu_device_ip_fini() is executed before amdgpu_fence_driver_sw_fini(), 
amdgpu_device_ip_fini() will call gfx_vX_0_sw_fini() 
then call amdgpu_ring_fini() and set adev->rings[i] to NULL.
Nothing will be freed in amdgpu_fence_driver_sw_fini().
ring->fence_drv.fences  memory leak happened!

void amdgpu_ring_fini(struct amdgpu_ring *ring)
{
	......
	ring->adev->rings[ring->idx] = NULL;
}

Regards,
Lang

>
>
>>>
>>> Fixes: 72c8c97b1522 ("drm/amdgpu: Split amdgpu_device_fini into early
>>> and late")
>>>
>>> Signed-off-by: Lang Yu <lang.yu at amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 41ce86244144..5654c4790773 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -3843,8 +3843,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device
>>> *adev)
>>>
>>> void amdgpu_device_fini_sw(struct amdgpu_device *adev)  {
>>> -	amdgpu_device_ip_fini(adev);
>>> 	amdgpu_fence_driver_sw_fini(adev);
>>> +	amdgpu_device_ip_fini(adev);
>>> 	release_firmware(adev->firmware.gpu_info_fw);
>>> 	adev->firmware.gpu_info_fw = NULL;
>>> 	adev->accel_working = false;
>>> --
>>> 2.25.1


More information about the amd-gfx mailing list