[PATCH 1/2] drm/ttm: never add BO that failed to validate to the LRU list
Christian König
deathsimple at vodafone.de
Tue Feb 14 12:53:14 UTC 2017
Am 14.02.2017 um 12:37 schrieb Nicolai Hähnle:
> On 14.02.2017 11:38, Christian König wrote:
>> Am 14.02.2017 um 10:18 schrieb Nicolai Hähnle:
>>> From: Nicolai Hähnle <nicolai.haehnle at amd.com>
>>>
>>> Fixes a potential race condition in amdgpu that looks as follows:
>>>
>>> Task 1: attempt ttm_bo_init, but ttm_bo_validate fails
>>> Task 1: add BO to global list anyway
>>> Task 2: grabs hold of the BO, waits on its reservation lock
>>> Task 1: releases its reference of the BO; never gives up the
>>> reservation lock
>>>
>>> The patch "drm/amdgpu: fix a potential deadlock in
>>> amdgpu_bo_create_restricted()" attempts to fix that by releasing
>>> the reservation lock in amdgpu code; unfortunately, it introduces
>>> a use-after-free when this race _doesn't_ happen.
>>>
>>> This patch should fix the race properly by never adding the BO
>>> to the global list in the first place.
>>>
>>> Cc: Samuel Pitoiset <samuel.pitoiset at gmail.com>
>>> Cc: zhoucm1 <david1.zhou at amd.com>
>>> Signed-off-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
>>
>> NAK, that is actually not correct either.
>>
>> The previous implementation doesn't have an use after free, but actually
>> a memory leak.
>>
>> I completely agree that adding the BO to the LRU in case of an error is
>> incorrect, but ttm_bo_add_to_lru() will add some references to the BO.
>
> Yes, but those are bo->list_krefs, not bo->krefs. ttm_bo_unref will
> put the bo->kref, which should lead to ttm_bo_release. This will then
> add the buffer to the ddestroy list and eventually put all the
> list_krefs. (Or possibly, if the ttm_bo_wait in
> ttm_bo_cleanup_refs_or_queue fails, it will explicitly delete the
> buffer from the LRU, which should amount to the same, I think).
>
> Or perhaps I'm still missing something, I'm not too familiar with the
> whole BO reference counting.
Ah, yes of course, never mind.
>
>
>> So the following ttm_bo_unref() will just drop the initial reference.
>>
>> Let's clean up this mess by moving the freeing of the BO in case of an
>> error to the caller.
>
> Yes, see my other patches.
In this case the set is Reviewed-by: Christian König
<christian.koenig at amd.com>.
Regards,
Christian.
>
> Thanks,
> Nicolai
>
>
>>
>> Regards,
>> Christian.
>>
>>> ---
>>> drivers/gpu/drm/ttm/ttm_bo.c | 12 +++++++-----
>>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 239a957..76bee42 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -1215,18 +1215,20 @@ int ttm_bo_init(struct ttm_bo_device *bdev,
>>> if (likely(!ret))
>>> ret = ttm_bo_validate(bo, placement, interruptible, false);
>>> - if (!resv) {
>>> + if (!resv)
>>> ttm_bo_unreserve(bo);
>>> - } else if (!(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) {
>>> + if (unlikely(ret)) {
>>> + ttm_bo_unref(&bo);
>>> + return ret;
>>> + }
>>> +
>>> + if (resv && !(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) {
>>> spin_lock(&bo->glob->lru_lock);
>>> ttm_bo_add_to_lru(bo);
>>> spin_unlock(&bo->glob->lru_lock);
>>> }
>>> - if (unlikely(ret))
>>> - ttm_bo_unref(&bo);
>>> -
>>> return ret;
>>> }
>>> EXPORT_SYMBOL(ttm_bo_init);
>>
>>
>
More information about the amd-gfx
mailing list