[PATCH] dmr/amdgpu: Fix wrongly unref of BO

Christian König deathsimple at vodafone.de
Wed Apr 19 06:35:24 UTC 2017


Hi AlexBin,

the answer is ttm_bo_kunmap isn't called at all and yes in the case of 
an iomap we leak the address space reserved.

But that is completely harmless on a 64bit system compared to leaking 
the memory backing the address space.

Using amdgpu_bo_free_kernel() instead of openly coding it here is 
probably a good idea.

Additional to that it's probably a good idea to set the no_intr flag 
when reserving kernel BOs. It is impossible to receive a signal during 
module load/unload, but it's probably better to document that in the 
code as well.

Regards,
Christian.

Am 18.04.2017 um 20:54 schrieb Xie, AlexBin:
> Hi Christian,
>
> Have you found how/where/when? When you said "mapping will just be 
> released a bit later on", you must know the answer.
>
> It is difficult to prove something does not exist. Anyway, I will give 
> it a try to prove such "later on" does not exist.
>
> Function ttm_bo_kunmap is the only function to unmap. To prove this, 
> search ttm_bo_map_iomap, only ttm_bo_kunmap use this enum to correctly 
> kunmap.
>
> Function ttm_bo_kunmap is not called by ttm itself. This is a hint 
> that all TTM delay delete mechanism or unref mechanism will NOT kunmap 
> BO later on.
>
> Function ttm_bo_kunmap is called by AMDGPU function amdgpu_bo_kunmap 
> and amdgpu_gem_prime_vunmap.
>
> Search AMDGPU for amdgpu_bo_kunmap. All matches do not kunmap for 
> scratch VRAM BO. amdgpu_bo_free_kernel is a suspect but the answer is 
> still NO.
>
> So all possibilities are searched. Did I miss anything?
>
> Thanks,
> Alex Bin Xie
>
> ------------------------------------------------------------------------
> *From:* Xie, AlexBin
> *Sent:* Tuesday, April 18, 2017 2:04:33 PM
> *To:* Christian König; Zhou, David(ChunMing); 
> amd-gfx at lists.freedesktop.org
> *Subject:* Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO
>
> Hi Christian,
>
>
> Would you point out where/when will kunmap happen for this BO when 
> release? It must be somewhere in some function calls.
>
>
> I checked before I asked for review. But I did not see such obvious 
> kunmap function call.
>
>
> If so, there should be a comment in function amdgpu_vram_scratch_fini 
> to avoid future confusion.
>
>
> Thanks,
> Alex Bin Xie
> ------------------------------------------------------------------------
> *From:* Christian König <deathsimple at vodafone.de>
> *Sent:* Tuesday, April 18, 2017 1:46 PM
> *To:* Xie, AlexBin; Zhou, David(ChunMing); amd-gfx at lists.freedesktop.org
> *Subject:* Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO
> Hi AlexBin,
>
> No, David is right. This is a very common coding pattern in the kernel 
> module.
>
> Freeing up a BO while there still exists a kernel mapping is perfectly 
> ok, the mapping will just be released a bit later on.
>
> So this code is actually perfectly ok and just an optimization, but 
> your patch breaks it and creates a memory leak.
>
> Regards,
> Christian.
>
> Am 18.04.2017 um 17:17 schrieb Xie, AlexBin:
>>
>> Hi David,
>>
>>
>> When amdgpu_bo_reserve return errors, we cannot release the BO. This 
>> is not a memory leak. General speaking, memory leak is unnoticed and 
>> unintentional.
>>
>>
>> The caller of function amdgpu_vram_scratch_fini ignores the return 
>> error value...
>>
>>
>> The "memory leak" is not caused by my patch. It is caused because 
>> reserving BO fails.
>>
>>
>> This patch only aim to make function amdgpu_vram_scratch_fini behave 
>> correctly.
>>
>>
>> To follow up, we can add a warning message when amdgpu_bo_reserve 
>> error happens in a different patch.
>>
>>
>> If function call amdgpu_bo_reserve is changed to uninterruptible, 
>> this changes driver behaviour. Without a substantial issue, I would 
>> be cautious for such a change.
>>
>>
>> Thanks,
>>
>> Alex Bin Xie
>>
>>
>> ------------------------------------------------------------------------
>> *From:* Zhou, David(ChunMing)
>> *Sent:* Monday, April 17, 2017 10:38 PM
>> *To:* Xie, AlexBin; amd-gfx at lists.freedesktop.org
>> *Subject:* Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO
>>
>>
>> On 2017年04月17日 22:54, Xie, AlexBin wrote:
>>>
>>> Hi David,
>>>
>>>
>>> Thanks for the comments. However, please have look at 
>>> amdgpu_bo_reserve definition.
>>>
>>> static inline int amdgpu_bo_reserve(struct amdgpu_bo *bo, bool no_intr)
>>>
>> Ah, this is a wired wrapper for ttm_bo_reserve.
>>
>>>
>>> When we call this function like the following:
>>>
>>>          r = amdgpu_bo_reserve(adev->vram_scratch.robj, false);
>>> The false means interruptible.
>>>
>>>
>>> On the other hand,  when amdgpu_bo_reserve function return error, 
>>> why do we unref BO without kunmap and unpin the BO? This is wrong 
>>> implementation when amdgpu_bo_reserve return any error.
>>>
>> Yeah, I see your mean, it's in driver un-loading, How about changing 
>> to no interruptible? Your patch will make a memleak if bo_reserve 
>> fails, but it seems not matter. I have no strong preference.
>>
>> Regards,
>> David Zhou
>>>
>>>
>>> Thanks,
>>> Alex Bin Xie
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Zhou, David(ChunMing)
>>> *Sent:* Friday, April 14, 2017 12:00 AM
>>> *To:* Xie, AlexBin; amd-gfx at lists.freedesktop.org
>>> *Subject:* Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO
>>>
>>>
>>> On 2017年04月14日 05:34, Alex Xie wrote:
>>> > According to comment of amdgpu_bo_reserve, amdgpu_bo_reserve
>>> > can return with -ERESTARTSYS. When this function was interrupted
>>> > by a signal, BO should not be unref. Otherwise the BO might be
>>> > released while is kmapped and pinned, or BO MIGHT be deref
>>> > multiple times, etc.
>>>          r = amdgpu_bo_reserve(adev->vram_scratch.robj, false);
>>> we have specified interruptible to false, so -ERESTARTSYS isn't 
>>> possible
>>> here.
>>>
>>> Thanks,
>>> David Zhou
>>> >
>>> > Change-Id: If76071a768950a0d3ad9d5da7fcae04881807621
>>> > Signed-off-by: Alex Xie <AlexBin.Xie at amd.com>
>>> > ---
>>> > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>>> >   1 file changed, 1 insertion(+), 1 deletion(-)
>>> >
>>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> > index 53996e3..1dcc2d1 100644
>>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> > @@ -355,8 +355,8 @@ static void amdgpu_vram_scratch_fini(struct 
>>> amdgpu_device *adev)
>>> > amdgpu_bo_kunmap(adev->vram_scratch.robj);
>>> > amdgpu_bo_unpin(adev->vram_scratch.robj);
>>> > amdgpu_bo_unreserve(adev->vram_scratch.robj);
>>> > + amdgpu_bo_unref(&adev->vram_scratch.robj);
>>> >        }
>>> > - amdgpu_bo_unref(&adev->vram_scratch.robj);
>>> >   }
>>> >
>>> >   /**
>>>
>>
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20170419/7f3f71c0/attachment-0001.html>


More information about the amd-gfx mailing list