[RFC 0/5] Discussion around eviction improvements

Christian König christian.koenig at amd.com
Mon May 13 06:50:13 UTC 2024


Just FYI, I've been on sick leave for a while and now trying to catch up.

It will probably be at least week until I can look into this again.

Sorry,
Christian.

Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin:
> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>
> Last few days I was looking at the situation with VRAM over subscription, what
> happens versus what perhaps should happen. Browsing through the driver and
> running some simple experiments.
>
> I ended up with this patch series which, as a disclaimer, may be completely
> wrong but as I found some suspicious things, to me at least, I thought it was a
> good point to stop and request some comments.
>
> To perhaps summarise what are the main issues I think I found:
>
>   * Migration rate limiting does not bother knowing if actual migration happened
>     and so can over-account and unfairly penalise.
>
>   * Migration rate limiting does not even work, at least not for the common case
>     where userspace configures VRAM+GTT. It thinks it can stop migration attempts
>     by playing with bo->allowed_domains vs bo->preferred domains but, both from
>     the code, and from empirical experiments, I see that not working at all. Both
>     masks are identical so fiddling with them achieves nothing.
>
>   * Idea of the fallback placement only works when VRAM has free space. As soon
>     as it does not, ttm_resource_compatible is happy to leave the buffers in the
>     secondary placement forever.
>
>   * Driver thinks it will be re-validating evicted buffers on the next submission
>     but it does not for the very common case of VRAM+GTT because it only checks
>     if current placement is *none* of the preferred placements.
>
> All those problems are addressed in individual patches.
>
> End result of this series appears to be driver which will try harder to move
> buffers back into VRAM, but will be (more) correctly throttled in doing so by
> the existing rate limiting logic.
>
> I have run a quick benchmark of Cyberpunk 2077 and cannot say that I saw a
> change but that could be a good thing too. At least I did not break anything,
> perhaps.. On one occassion I did see the rate limiting logic get confused while
> for a period of few minutes it went to a mode where it was constantly giving a
> high migration budget. But that recovered itself when I switched clients and did
> not come back so I don't know. If there is something wrong there I don't think
> it would be caused by any patches in this series.
>
> Series is probably rough but should be good enough for dicsussion. I am curious
> to hear if I identified at least something correctly as a real problem.
>
> It would also be good to hear what are the suggested games to check and see
> whether there is any improvement.
>
> Cc: Christian König <christian.koenig at amd.com>
> Cc: Friedrich Vock <friedrich.vock at gmx.de>
>
> Tvrtko Ursulin (5):
>    drm/amdgpu: Fix migration rate limiting accounting
>    drm/amdgpu: Actually respect buffer migration budget
>    drm/ttm: Add preferred placement flag
>    drm/amdgpu: Use preferred placement for VRAM+GTT
>    drm/amdgpu: Re-validate evicted buffers
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     | 38 +++++++++++++++++-----
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  8 +++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c     | 21 ++++++++++--
>   drivers/gpu/drm/ttm/ttm_resource.c         | 13 +++++---
>   include/drm/ttm/ttm_placement.h            |  3 ++
>   5 files changed, 65 insertions(+), 18 deletions(-)
>



More information about the amd-gfx mailing list