[RFC 0/5] Discussion around eviction improvements
Tvrtko Ursulin
tursulin at ursulin.net
Mon May 13 13:49:51 UTC 2024
On 09/05/2024 13:40, Tvrtko Ursulin wrote:
>
> On 08/05/2024 19:09, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>
>> Last few days I was looking at the situation with VRAM over
>> subscription, what
>> happens versus what perhaps should happen. Browsing through the driver
>> and
>> running some simple experiments.
>>
>> I ended up with this patch series which, as a disclaimer, may be
>> completely
>> wrong but as I found some suspicious things, to me at least, I thought
>> it was a
>> good point to stop and request some comments.
>>
>> To perhaps summarise what are the main issues I think I found:
>>
>> * Migration rate limiting does not bother knowing if actual
>> migration happened
>> and so can over-account and unfairly penalise.
>>
>> * Migration rate limiting does not even work, at least not for the
>> common case
>> where userspace configures VRAM+GTT. It thinks it can stop
>> migration attempts
>> by playing with bo->allowed_domains vs bo->preferred domains but,
>> both from
>> the code, and from empirical experiments, I see that not working
>> at all. Both
>> masks are identical so fiddling with them achieves nothing.
>>
>> * Idea of the fallback placement only works when VRAM has free
>> space. As soon
>> as it does not, ttm_resource_compatible is happy to leave the
>> buffers in the
>> secondary placement forever.
>>
>> * Driver thinks it will be re-validating evicted buffers on the next
>> submission
>> but it does not for the very common case of VRAM+GTT because it
>> only checks
>> if current placement is *none* of the preferred placements.
>>
>> All those problems are addressed in individual patches.
>>
>> End result of this series appears to be driver which will try harder
>> to move
>> buffers back into VRAM, but will be (more) correctly throttled in
>> doing so by
>> the existing rate limiting logic.
>>
>> I have run a quick benchmark of Cyberpunk 2077 and cannot say that I
>> saw a
>> change but that could be a good thing too. At least I did not break
>> anything,
>> perhaps.. On one occassion I did see the rate limiting logic get
>> confused while
>> for a period of few minutes it went to a mode where it was constantly
>> giving a
>> high migration budget. But that recovered itself when I switched
>> clients and did
>> not come back so I don't know. If there is something wrong there I
>> don't think
>> it would be caused by any patches in this series.
>
> Since yesterday I also briefly tested with Far Cry New Dawn. One run
> each so possibly doesn't mean anything apart that there isn't a
> regression aka migration throttling is keeping things at bay even with
> increased requests to migrate things back to VRAM:
>
> before after
> min/avg/max fps 36/44/54 37/45/55
>
> Cyberpunk 2077 from yesterday was similarly close:
>
> 26.96/29.59/30.40 29.70/30.00/30.32
>
> I guess the real story is proper DGPU where misplaced buffers have a
> real cost.
I found one game which regresses spectacularly badly with this series -
Assasin's Creed Valhalla. The built-in benchmark at least. The game
appears to have a working set much larger than the other games I tested,
around 5GiB total during the benchmark. And for some reason migration
throttling totally fails to put it in check. I will be investigating
this shortly.
Regards,
Tvrtko
>> Series is probably rough but should be good enough for dicsussion. I
>> am curious
>> to hear if I identified at least something correctly as a real problem.
>>
>> It would also be good to hear what are the suggested games to check
>> and see
>> whether there is any improvement.
>>
>> Cc: Christian König <christian.koenig at amd.com>
>> Cc: Friedrich Vock <friedrich.vock at gmx.de>
>>
>> Tvrtko Ursulin (5):
>> drm/amdgpu: Fix migration rate limiting accounting
>> drm/amdgpu: Actually respect buffer migration budget
>> drm/ttm: Add preferred placement flag
>> drm/amdgpu: Use preferred placement for VRAM+GTT
>> drm/amdgpu: Re-validate evicted buffers
>>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 38 +++++++++++++++++-----
>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 +++--
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 21 ++++++++++--
>> drivers/gpu/drm/ttm/ttm_resource.c | 13 +++++---
>> include/drm/ttm/ttm_placement.h | 3 ++
>> 5 files changed, 65 insertions(+), 18 deletions(-)
>>
More information about the amd-gfx
mailing list