[PATCH 1/2] drm/ttm: never add BO that failed to validate to the LRU list
Nicolai Hähnle
nhaehnle at gmail.com
Tue Feb 14 11:37:44 UTC 2017
On 14.02.2017 11:38, Christian König wrote:
> Am 14.02.2017 um 10:18 schrieb Nicolai Hähnle:
>> From: Nicolai Hähnle <nicolai.haehnle at amd.com>
>>
>> Fixes a potential race condition in amdgpu that looks as follows:
>>
>> Task 1: attempt ttm_bo_init, but ttm_bo_validate fails
>> Task 1: add BO to global list anyway
>> Task 2: grabs hold of the BO, waits on its reservation lock
>> Task 1: releases its reference of the BO; never gives up the
>> reservation lock
>>
>> The patch "drm/amdgpu: fix a potential deadlock in
>> amdgpu_bo_create_restricted()" attempts to fix that by releasing
>> the reservation lock in amdgpu code; unfortunately, it introduces
>> a use-after-free when this race _doesn't_ happen.
>>
>> This patch should fix the race properly by never adding the BO
>> to the global list in the first place.
>>
>> Cc: Samuel Pitoiset <samuel.pitoiset at gmail.com>
>> Cc: zhoucm1 <david1.zhou at amd.com>
>> Signed-off-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
>
> NAK, that is actually not correct either.
>
> The previous implementation doesn't have an use after free, but actually
> a memory leak.
>
> I completely agree that adding the BO to the LRU in case of an error is
> incorrect, but ttm_bo_add_to_lru() will add some references to the BO.
Yes, but those are bo->list_krefs, not bo->krefs. ttm_bo_unref will put
the bo->kref, which should lead to ttm_bo_release. This will then add
the buffer to the ddestroy list and eventually put all the list_krefs.
(Or possibly, if the ttm_bo_wait in ttm_bo_cleanup_refs_or_queue fails,
it will explicitly delete the buffer from the LRU, which should amount
to the same, I think).
Or perhaps I'm still missing something, I'm not too familiar with the
whole BO reference counting.
> So the following ttm_bo_unref() will just drop the initial reference.
>
> Let's clean up this mess by moving the freeing of the BO in case of an
> error to the caller.
Yes, see my other patches.
Thanks,
Nicolai
>
> Regards,
> Christian.
>
>> ---
>> drivers/gpu/drm/ttm/ttm_bo.c | 12 +++++++-----
>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index 239a957..76bee42 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -1215,18 +1215,20 @@ int ttm_bo_init(struct ttm_bo_device *bdev,
>> if (likely(!ret))
>> ret = ttm_bo_validate(bo, placement, interruptible, false);
>> - if (!resv) {
>> + if (!resv)
>> ttm_bo_unreserve(bo);
>> - } else if (!(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) {
>> + if (unlikely(ret)) {
>> + ttm_bo_unref(&bo);
>> + return ret;
>> + }
>> +
>> + if (resv && !(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) {
>> spin_lock(&bo->glob->lru_lock);
>> ttm_bo_add_to_lru(bo);
>> spin_unlock(&bo->glob->lru_lock);
>> }
>> - if (unlikely(ret))
>> - ttm_bo_unref(&bo);
>> -
>> return ret;
>> }
>> EXPORT_SYMBOL(ttm_bo_init);
>
>
More information about the amd-gfx
mailing list