[PATCH] drm/amdgpu: fix error handling in amdgpu_bo_do_create

Michel Dänzer michel at daenzer.net
Tue Oct 31 15:33:04 UTC 2017


On 31/10/17 04:25 PM, Christian König wrote:
> Am 31.10.2017 um 16:14 schrieb Michel Dänzer:
>> On 31/10/17 04:03 PM, Christian König wrote:
>>> Am 31.10.2017 um 09:43 schrieb Michel Dänzer:
>>>> On 31/10/17 09:37 AM, Christian König wrote:
>>>>> From: Christian König <christian.koenig at amd.com>
>>>>>
>>>>> The bo structure is freed up in case of an error, so we can't do any
>>>>> accounting if that happens.
>>>>>
>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +++---
>>>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>> index e44b880eefdd..ff6f842655d1 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>> @@ -369,6 +369,9 @@ static int amdgpu_bo_do_create(struct
>>>>> amdgpu_device *adev,
>>>>>        r = ttm_bo_init_reserved(&adev->mman.bdev, &bo->tbo, size,
>>>>> type,
>>>>>                     &bo->placement, page_align, !kernel, NULL,
>>>>>                     acc_size, sg, resv, &amdgpu_ttm_bo_destroy);
>>>>> +    if (unlikely(r != 0))
>>>>> +        return r;
>>>>> +
>>>>>        bytes_moved = atomic64_read(&adev->num_bytes_moved) -
>>>>>                  initial_bytes_moved;
>>>>>        if (adev->mc.visible_vram_size < adev->mc.real_vram_size &&
>>>>> @@ -378,9 +381,6 @@ static int amdgpu_bo_do_create(struct
>>>>> amdgpu_device *adev,
>>>>>        else
>>>>>            amdgpu_cs_report_moved_bytes(adev, bytes_moved, 0);
>>>>>    -    if (unlikely(r != 0))
>>>>> -        return r;
>>>>> -
>>>>>        if (kernel)
>>>>>            bo->tbo.priority = 1;
>>>>>   
>>>> Hmm. If it's at least theoretically possible that ttm_bo_init_reserved
>>>> moves some BOs before returning an error, we should instead make the
>>>> accounting safe WRT bo having been freed. But assuming it's not
>>>> possible,
>>> I have made patches for this back in April and send them to the list,
>>> but the value is rather low and you need to change a lot of places in
>>> other drivers as well.
>> At least in this case, simply moving the
>>
>>     if (adev->mc.visible_vram_size < adev->mc.real_vram_size &&
>>         bo->tbo.mem.mem_type == TTM_PL_VRAM &&
>>         bo->tbo.mem.start < adev->mc.visible_vram_size >> PAGE_SHIFT)
>>
>> test before the ttm_bo_init_reserved call, and then passing bytes_moved
>> or 0 for the third parameter of amdgpu_cs_report_moved_bytes depending
>> on its result, would work as well, wouldn't it?
> No, that won't work. Neither bo->tbo.mem.mem_type nor bo->tbo.mem.start
> are determined yet before the call to ttm_bo_init().
> 
> The root problem is that we have a mis-assumption about how visible VRAM
> works here (e.g. it doesn't handle the case where a BO is partially in
> visible VRAM) and that we use the global bytes moved counter as
> indicator for the local operation (e.g. to know how many bytes moved the
> current client).
> 
> The only real solution is to give ttm_bo_init_* and ttm_bo_validate a
> context parameter where we can note how much work we had to do to
> fulfill the request.

Right, though in the meantime I wonder if over-accounting (i.e. calling
amdgpu_cs_report_moved_bytes(adev, bytes_moved, bytes_moved)) in the
failure case wouldn't be better than not accounting at all.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer


More information about the amd-gfx mailing list