Plan: BO move throttling for visible VRAM evictions

Tue Mar 28 10:51:36 UTC 2017

On Mar 28, 2017 3:07 AM, "Michel Dänzer" <michel at daenzer.net> wrote:

On 27/03/17 07:29 PM, Marek Olšák wrote:
> On Mar 27, 2017 9:35 AM, "Michel Dänzer" <michel at daenzer.net
> <mailto:michel at daenzer.net>> wrote:
>
>     On 25/03/17 01:33 AM, Marek Olšák wrote:
>     > Hi,
>     >
>     > I'm sharing this idea here, because it's something that has been
>     > decreasing our performance a lot recently, for example:
>     >
>     http://openbenchmarking.org/prospect/1703011-RI-RADEONDIR06/
7b7668cfc109d1c3dc27e871c8aea71ca13f23fa
>     <http://openbenchmarking.org/prospect/1703011-RI-RADEONDIR06/
7b7668cfc109d1c3dc27e871c8aea71ca13f23fa>
>     >
>     > I think the problem there is that Mesa git started uploading
>     > descriptors and uniforms to VRAM, which helps when TC L2 has a low
>     > hit/miss ratio, but the performance can randomly drop by an order of
>     > magnitude. I've heard rumours that kernel 4.11 has an improved
>     > allocator that should perform better, but the situation is still far
>     > from ideal.
>     >
>     > AMD CPUs and APUs will hopefully suffer less, because we can resize
>     > the visible VRAM with the help of our CPU hw specs, but Intel CPUs
>     > will remain limited to 256 MB. The following plan describes how to
do
>     > throttling for visible VRAM evictions.
>     >
>     >
>     > 1) Theory
>     >
>     > Initially, the driver doesn't care about where buffers are in VRAM,
>     > because VRAM buffers are only moved to visible VRAM on CPU page
faults
>     > (when the CPU touches the buffer memory but the memory is in the
>     > invisible part of VRAM). When it happens,
>     > amdgpu_bo_fault_reserve_notify is called, which moves the buffer to
>     > visible VRAM, and the app continues. amdgpu_bo_fault_reserve_notify
>     > also marks the buffer as contiguous, which makes memory
fragmentation
>     > worse.
>     >
>     > I verified this with DiRT Rally where amdgpu_bo_fault_reserve_notify
>     > was much higher in a CPU profiler than anything else in the kernel.
>     >
>     >
>     > 2) Monitoring via Gallium HUD
>     >
>     > We need to expose 2 kernel counters via the INFO ioctl and display
>     > those via Gallium HUD:
>     > - The number of VRAM CPU page faults. (the number of calls to
>     > amdgpu_bo_fault_reserve_notify).
>     > - The number of bytes moved by ttm_bo_validate inside
>     > amdgpu_bo_fault_reserve_notify.
>     >
>     > This will help us observe what exactly is happening and fine-tune
the
>     > throttling when it's done.
>     >
>     >
>     > 3) Solution
>     >
>     > a) When amdgpu_bo_fault_reserve_notify is called, record the fact.
>     > (amdgpu_bo::had_cpu_page_fault = true)
>     >
>     > b) Monitor the MB/s rate at which buffers are moved by
>     > amdgpu_bo_fault_reserve_notify. If we get above a specific
threshold,
>     > don't move the buffer to visible VRAM. Move it to GTT instead. Note
>     > that moving to GTT can be cheaper, because moving to visible VRAM is
>     > likely to evict a lot of buffers there and unmap them from the CPU,
>
>     FWIW, this can be avoided by only setting GTT in busy_placement. Then
>     TTM will only move the BO to visible VRAM if that can be done without
>     evicting anything from there.
>
>
>     > but moving to GTT shouldn't evict or unmap anything.
>     >
>     > c) When we get into the CS ioctl and a buffer has
had_cpu_page_fault,
>     > it can be moved to VRAM if:
>     > - the GTT->VRAM move rate is low enough to allow it (this is the
>     > existing throttling mechanism)
>     > - the visible VRAM move rate is low enough that we will be OK with
>     > another CPU page fault if it happens.
>
>     Some other ideas that might be worth trying:
>
>     Evicting BOs to GTT instead of moving them to CPU accessible VRAM in
>     principle in some cases (e.g. for all BOs except those with
>     AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) or even always.
>
>
> I've tried this and it made things even worse.

What exactly did you try?

I only set the placement to GTT, but I think I kept the contiguous flag.

Marek

--
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20170328/16653f38/attachment.html>