Plan: BO move throttling for visible VRAM evictions

Fri Mar 24 16:59:10 UTC 2017

> -----Original Message-----
> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf
> Of Marek Olšák
> Sent: Friday, March 24, 2017 12:34 PM
> To: amd-gfx mailing list
> Subject: Plan: BO move throttling for visible VRAM evictions
> 
> Hi,
> 
> I'm sharing this idea here, because it's something that has been
> decreasing our performance a lot recently, for example:
> http://openbenchmarking.org/prospect/1703011-RI-
> RADEONDIR06/7b7668cfc109d1c3dc27e871c8aea71ca13f23fa
> 
> I think the problem there is that Mesa git started uploading
> descriptors and uniforms to VRAM, which helps when TC L2 has a low
> hit/miss ratio, but the performance can randomly drop by an order of
> magnitude. I've heard rumours that kernel 4.11 has an improved
> allocator that should perform better, but the situation is still far
> from ideal.
> 
> AMD CPUs and APUs will hopefully suffer less, because we can resize
> the visible VRAM with the help of our CPU hw specs, but Intel CPUs
> will remain limited to 256 MB. The following plan describes how to do
> throttling for visible VRAM evictions.

Has anyone checked the Intel chipset docs?  Maybe they document the interface?  There's also ACPI _SRS which should be the vendor independent way to handle this.

Alex

> 
> 
> 1) Theory
> 
> Initially, the driver doesn't care about where buffers are in VRAM,
> because VRAM buffers are only moved to visible VRAM on CPU page faults
> (when the CPU touches the buffer memory but the memory is in the
> invisible part of VRAM). When it happens,
> amdgpu_bo_fault_reserve_notify is called, which moves the buffer to
> visible VRAM, and the app continues. amdgpu_bo_fault_reserve_notify
> also marks the buffer as contiguous, which makes memory fragmentation
> worse.
> 
> I verified this with DiRT Rally where amdgpu_bo_fault_reserve_notify
> was much higher in a CPU profiler than anything else in the kernel.
> 
> 
> 2) Monitoring via Gallium HUD
> 
> We need to expose 2 kernel counters via the INFO ioctl and display
> those via Gallium HUD:
> - The number of VRAM CPU page faults. (the number of calls to
> amdgpu_bo_fault_reserve_notify).
> - The number of bytes moved by ttm_bo_validate inside
> amdgpu_bo_fault_reserve_notify.
> 
> This will help us observe what exactly is happening and fine-tune the
> throttling when it's done.
> 
> 
> 3) Solution
> 
> a) When amdgpu_bo_fault_reserve_notify is called, record the fact.
> (amdgpu_bo::had_cpu_page_fault = true)
> 
> b) Monitor the MB/s rate at which buffers are moved by
> amdgpu_bo_fault_reserve_notify. If we get above a specific threshold,
> don't move the buffer to visible VRAM. Move it to GTT instead. Note
> that moving to GTT can be cheaper, because moving to visible VRAM is
> likely to evict a lot of buffers there and unmap them from the CPU,
> but moving to GTT shouldn't evict or unmap anything.
> 
> c) When we get into the CS ioctl and a buffer has had_cpu_page_fault,
> it can be moved to VRAM if:
> - the GTT->VRAM move rate is low enough to allow it (this is the
> existing throttling mechanism)
> - the visible VRAM move rate is low enough that we will be OK with
> another CPU page fault if it happens.
> 
> d) The solution can be fine-tuned with the help of Gallium HUD to get
> the best performance under various scenarios. The current throttling
> mechanism can serve as an inspiration.
> 
> 
> That's it. Feel free to comment. I think this is our biggest
> performance bottleneck at the moment.
> 
> Marek
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx