Plan: BO move throttling for visible VRAM evictions
maraeo at gmail.com
Tue Mar 28 10:51:36 UTC 2017
On Mar 28, 2017 3:07 AM, "Michel Dänzer" <michel at daenzer.net> wrote:
On 27/03/17 07:29 PM, Marek Olšák wrote:
> On Mar 27, 2017 9:35 AM, "Michel Dänzer" <michel at daenzer.net
> <mailto:michel at daenzer.net>> wrote:
> On 25/03/17 01:33 AM, Marek Olšák wrote:
> > Hi,
> > I'm sharing this idea here, because it's something that has been
> > decreasing our performance a lot recently, for example:
> > I think the problem there is that Mesa git started uploading
> > descriptors and uniforms to VRAM, which helps when TC L2 has a low
> > hit/miss ratio, but the performance can randomly drop by an order of
> > magnitude. I've heard rumours that kernel 4.11 has an improved
> > allocator that should perform better, but the situation is still far
> > from ideal.
> > AMD CPUs and APUs will hopefully suffer less, because we can resize
> > the visible VRAM with the help of our CPU hw specs, but Intel CPUs
> > will remain limited to 256 MB. The following plan describes how to
> > throttling for visible VRAM evictions.
> > 1) Theory
> > Initially, the driver doesn't care about where buffers are in VRAM,
> > because VRAM buffers are only moved to visible VRAM on CPU page
> > (when the CPU touches the buffer memory but the memory is in the
> > invisible part of VRAM). When it happens,
> > amdgpu_bo_fault_reserve_notify is called, which moves the buffer to
> > visible VRAM, and the app continues. amdgpu_bo_fault_reserve_notify
> > also marks the buffer as contiguous, which makes memory
> > worse.
> > I verified this with DiRT Rally where amdgpu_bo_fault_reserve_notify
> > was much higher in a CPU profiler than anything else in the kernel.
> > 2) Monitoring via Gallium HUD
> > We need to expose 2 kernel counters via the INFO ioctl and display
> > those via Gallium HUD:
> > - The number of VRAM CPU page faults. (the number of calls to
> > amdgpu_bo_fault_reserve_notify).
> > - The number of bytes moved by ttm_bo_validate inside
> > amdgpu_bo_fault_reserve_notify.
> > This will help us observe what exactly is happening and fine-tune
> > throttling when it's done.
> > 3) Solution
> > a) When amdgpu_bo_fault_reserve_notify is called, record the fact.
> > (amdgpu_bo::had_cpu_page_fault = true)
> > b) Monitor the MB/s rate at which buffers are moved by
> > amdgpu_bo_fault_reserve_notify. If we get above a specific
> > don't move the buffer to visible VRAM. Move it to GTT instead. Note
> > that moving to GTT can be cheaper, because moving to visible VRAM is
> > likely to evict a lot of buffers there and unmap them from the CPU,
> FWIW, this can be avoided by only setting GTT in busy_placement. Then
> TTM will only move the BO to visible VRAM if that can be done without
> evicting anything from there.
> > but moving to GTT shouldn't evict or unmap anything.
> > c) When we get into the CS ioctl and a buffer has
> > it can be moved to VRAM if:
> > - the GTT->VRAM move rate is low enough to allow it (this is the
> > existing throttling mechanism)
> > - the visible VRAM move rate is low enough that we will be OK with
> > another CPU page fault if it happens.
> Some other ideas that might be worth trying:
> Evicting BOs to GTT instead of moving them to CPU accessible VRAM in
> principle in some cases (e.g. for all BOs except those with
> AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) or even always.
> I've tried this and it made things even worse.
What exactly did you try?
I only set the placement to GTT, but I think I kept the contiguous flag.
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the amd-gfx