[RFC 1/5] drm/amdgpu: Fix migration rate limiting accounting
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Wed May 15 10:51:49 UTC 2024
On 15/05/2024 08:14, Christian König wrote:
> Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin:
>> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>
>> The logic assumed any migration attempt worked and therefore would over-
>> account the amount of data migrated during buffer re-validation. As a
>> consequence client can be unfairly penalised by incorrectly considering
>> its migration budget spent.
>>
>> Fix it by looking at the before and after buffer object backing store and
>> only account if there was a change.
>>
>> FIXME:
>> I think this needs a better solution to account for migrations between
>> VRAM visible and non-visible portions.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>> Cc: Christian König <christian.koenig at amd.com>
>> Cc: Friedrich Vock <friedrich.vock at gmx.de>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 26 +++++++++++++++++++++-----
>> 1 file changed, 21 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index ec888fc6ead8..22708954ae68 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -784,12 +784,15 @@ static int amdgpu_cs_bo_validate(void *param,
>> struct amdgpu_bo *bo)
>> .no_wait_gpu = false,
>> .resv = bo->tbo.base.resv
>> };
>> + struct ttm_resource *old_res;
>> uint32_t domain;
>> int r;
>> if (bo->tbo.pin_count)
>> return 0;
>> + old_res = bo->tbo.resource;
>> +
>> /* Don't move this buffer if we have depleted our allowance
>> * to move it. Don't move anything if the threshold is zero.
>> */
>> @@ -817,16 +820,29 @@ static int amdgpu_cs_bo_validate(void *param,
>> struct amdgpu_bo *bo)
>> amdgpu_bo_placement_from_domain(bo, domain);
>> r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>> - p->bytes_moved += ctx.bytes_moved;
>> - if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
>> - amdgpu_res_cpu_visible(adev, bo->tbo.resource))
>> - p->bytes_moved_vis += ctx.bytes_moved;
>> -
>> if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
>> domain = bo->allowed_domains;
>> goto retry;
>> }
>> + if (!r) {
>> + struct ttm_resource *new_res = bo->tbo.resource;
>> + bool moved = true;
>> +
>> + if (old_res == new_res)
>> + moved = false;
>> + else if (old_res && new_res &&
>> + old_res->mem_type == new_res->mem_type)
>> + moved = false;
>
> The old resource might already be destroyed after you return from
> validation. So this here won't work.
>
> Apart from that even when a migration attempt fails the moved bytes
> should be accounted.
>
> When the validation attempt doesn't caused any moves then the bytecount
> here would be zero.
>
> So as far as I can see that is as fair as you can get.
Right, I think I suffered a bit of tunnel vision here and completely
ignore the _ctx_.moved_bytes part. Scratch this one too then.
Regards,
Tvrtko
>
> Regards,
> Christian.
>
> PS: Looks like our mail servers are once more not very reliable.
>
> If you get mails from me multiple times please just ignore it.
>
>> +
>> + if (moved) {
>> + p->bytes_moved += ctx.bytes_moved;
>> + if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
>> + amdgpu_res_cpu_visible(adev, bo->tbo.resource))
>> + p->bytes_moved_vis += ctx.bytes_moved;
>> + }
>> + }
>> +
>> return r;
>> }
>
More information about the amd-gfx
mailing list