[RFC 1/5] drm/amdgpu: Fix migration rate limiting accounting
Friedrich Vock
friedrich.vock at gmx.de
Wed May 8 19:08:57 UTC 2024
On 08.05.24 20:09, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>
> The logic assumed any migration attempt worked and therefore would over-
> account the amount of data migrated during buffer re-validation. As a
> consequence client can be unfairly penalised by incorrectly considering
> its migration budget spent.
If the migration failed but data was still moved (which I think could be
the case when we try evicting everything but it still doesn't work?),
shouldn't the eviction movements count towards the ratelimit too?
>
> Fix it by looking at the before and after buffer object backing store and
> only account if there was a change.
>
> FIXME:
> I think this needs a better solution to account for migrations between
> VRAM visible and non-visible portions.
FWIW, I have some WIP patches (not posted on any MLs yet though) that
attempt to solve this issue (+actually enforcing ratelimits) by moving
the ratelimit accounting/enforcement to TTM entirely.
By moving the accounting to TTM we can count moved bytes when we move
them, and don't have to rely on comparing resources to determine whether
moving actually happened. This should address your FIXME as well.
Regards,
Friedrich
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> Cc: Christian König <christian.koenig at amd.com>
> Cc: Friedrich Vock <friedrich.vock at gmx.de>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 26 +++++++++++++++++++++-----
> 1 file changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index ec888fc6ead8..22708954ae68 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -784,12 +784,15 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
> .no_wait_gpu = false,
> .resv = bo->tbo.base.resv
> };
> + struct ttm_resource *old_res;
> uint32_t domain;
> int r;
>
> if (bo->tbo.pin_count)
> return 0;
>
> + old_res = bo->tbo.resource;
> +
> /* Don't move this buffer if we have depleted our allowance
> * to move it. Don't move anything if the threshold is zero.
> */
> @@ -817,16 +820,29 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
> amdgpu_bo_placement_from_domain(bo, domain);
> r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>
> - p->bytes_moved += ctx.bytes_moved;
> - if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
> - amdgpu_res_cpu_visible(adev, bo->tbo.resource))
> - p->bytes_moved_vis += ctx.bytes_moved;
> -
> if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
> domain = bo->allowed_domains;
> goto retry;
> }
>
> + if (!r) {
> + struct ttm_resource *new_res = bo->tbo.resource;
> + bool moved = true;
> +
> + if (old_res == new_res)
> + moved = false;
> + else if (old_res && new_res &&
> + old_res->mem_type == new_res->mem_type)
> + moved = false;
> +
> + if (moved) {
> + p->bytes_moved += ctx.bytes_moved;
> + if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
> + amdgpu_res_cpu_visible(adev, bo->tbo.resource))
> + p->bytes_moved_vis += ctx.bytes_moved;
> + }
> + }
> +
> return r;
> }
>
More information about the amd-gfx
mailing list