[RFC 2/5] drm/amdgpu: Actually respect buffer migration budget
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Wed May 15 10:59:45 UTC 2024
On 15/05/2024 08:20, Christian König wrote:
> Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin:
>> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>
>> Current code appears to live in a misconception that playing with buffer
>> allowed and preferred placements can control the decision on whether
>> backing store migration will be attempted or not.
>>
>> Both from code inspection and from empirical experiments I see that not
>> being true, and that both allowed and preferred placement are typically
>> set to the same bitmask.
>
> That's not correct for the use case handled here, but see below.
Which part is not correct, that bo->preferred_domains and
bo->allower_domains are the same bitmask?
>>
>> As such, when the code decides to throttle the migration for a client, it
>> is in fact not achieving anything. Buffers can still be either
>> migrated or
>> not migrated based on the external (to this function and facility) logic.
>>
>> Fix it by not changing the buffer object placements if the migration
>> budget has been spent.
>>
>> FIXME:
>> Is it still required to call validate is the question..
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>> Cc: Christian König <christian.koenig at amd.com>
>> Cc: Friedrich Vock <friedrich.vock at gmx.de>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +++++++++---
>> 1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 22708954ae68..d07a1dd7c880 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -784,6 +784,7 @@ static int amdgpu_cs_bo_validate(void *param,
>> struct amdgpu_bo *bo)
>> .no_wait_gpu = false,
>> .resv = bo->tbo.base.resv
>> };
>> + bool migration_allowed = true;
>> struct ttm_resource *old_res;
>> uint32_t domain;
>> int r;
>> @@ -805,19 +806,24 @@ static int amdgpu_cs_bo_validate(void *param,
>> struct amdgpu_bo *bo)
>> * visible VRAM if we've depleted our allowance to do
>> * that.
>> */
>> - if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
>> + if (p->bytes_moved_vis < p->bytes_moved_vis_threshold) {
>> domain = bo->preferred_domains;
>> - else
>> + } else {
>> domain = bo->allowed_domains;
>> + migration_allowed = false;
>> + }
>> } else {
>> domain = bo->preferred_domains;
>> }
>> } else {
>> domain = bo->allowed_domains;
>> + migration_allowed = false;
>> }
>> retry:
>> - amdgpu_bo_placement_from_domain(bo, domain);
>> + if (migration_allowed)
>> + amdgpu_bo_placement_from_domain(bo, domain);
>
> That's completely invalid. Calling amdgpu_bo_placement_from_domain() is
> a mandatory prerequisite for calling ttm_bo_validate();
>
> E.g. the usually code fow is:
>
> /* This initializes bo->placement */
> amdgpu_bo_placement_from_domain()
>
> /* Eventually modify bo->placement to fit special requirements */
> ....
>
> /* Apply the placement to the BO */
> ttm_bo_validate(&bo->tbo, &bo->placement, &ctx)
>
> To sum it up bo->placement should be a variable on the stack instead,
> but we never bothered to clean that up.
I am not clear if you agree or not that the current method of trying to
avoid migration doesn't really do anything?
On stack placements sounds plausible to force migration avoidance by
putting a single current object placement in that list, if that is what
you have in mind? Or a specialized flag/version of
amdgpu_bo_placement_from_domain with an bool input like
"allow_placement_change"?
Regards,
Tvrtko
>
> Regards,
> Christian.
>
>> +
>> r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>> if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
>
More information about the amd-gfx
mailing list