[PATCH] drm/amdgpu: fix error handling in amdgpu_bo_do_create

Christian König christian.koenig at amd.com
Tue Oct 31 15:25:41 UTC 2017


Am 31.10.2017 um 16:14 schrieb Michel Dänzer:
> On 31/10/17 04:03 PM, Christian König wrote:
>> Am 31.10.2017 um 09:43 schrieb Michel Dänzer:
>>> On 31/10/17 09:37 AM, Christian König wrote:
>>>> From: Christian König <christian.koenig at amd.com>
>>>>
>>>> The bo structure is freed up in case of an error, so we can't do any
>>>> accounting if that happens.
>>>>
>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +++---
>>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> index e44b880eefdd..ff6f842655d1 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> @@ -369,6 +369,9 @@ static int amdgpu_bo_do_create(struct
>>>> amdgpu_device *adev,
>>>>        r = ttm_bo_init_reserved(&adev->mman.bdev, &bo->tbo, size, type,
>>>>                     &bo->placement, page_align, !kernel, NULL,
>>>>                     acc_size, sg, resv, &amdgpu_ttm_bo_destroy);
>>>> +    if (unlikely(r != 0))
>>>> +        return r;
>>>> +
>>>>        bytes_moved = atomic64_read(&adev->num_bytes_moved) -
>>>>                  initial_bytes_moved;
>>>>        if (adev->mc.visible_vram_size < adev->mc.real_vram_size &&
>>>> @@ -378,9 +381,6 @@ static int amdgpu_bo_do_create(struct
>>>> amdgpu_device *adev,
>>>>        else
>>>>            amdgpu_cs_report_moved_bytes(adev, bytes_moved, 0);
>>>>    -    if (unlikely(r != 0))
>>>> -        return r;
>>>> -
>>>>        if (kernel)
>>>>            bo->tbo.priority = 1;
>>>>   
>>> Hmm. If it's at least theoretically possible that ttm_bo_init_reserved
>>> moves some BOs before returning an error, we should instead make the
>>> accounting safe WRT bo having been freed. But assuming it's not possible,
>> I have made patches for this back in April and send them to the list,
>> but the value is rather low and you need to change a lot of places in
>> other drivers as well.
> At least in this case, simply moving the
>
> 	if (adev->mc.visible_vram_size < adev->mc.real_vram_size &&
> 	    bo->tbo.mem.mem_type == TTM_PL_VRAM &&
> 	    bo->tbo.mem.start < adev->mc.visible_vram_size >> PAGE_SHIFT)
>
> test before the ttm_bo_init_reserved call, and then passing bytes_moved
> or 0 for the third parameter of amdgpu_cs_report_moved_bytes depending
> on its result, would work as well, wouldn't it?
No, that won't work. Neither bo->tbo.mem.mem_type nor bo->tbo.mem.start 
are determined yet before the call to ttm_bo_init().

The root problem is that we have a mis-assumption about how visible VRAM 
works here (e.g. it doesn't handle the case where a BO is partially in 
visible VRAM) and that we use the global bytes moved counter as 
indicator for the local operation (e.g. to know how many bytes moved the 
current client).

The only real solution is to give ttm_bo_init_* and ttm_bo_validate a 
context parameter where we can note how much work we had to do to 
fulfill the request.

Regards,
Christian.


More information about the amd-gfx mailing list