Plan: BO move throttling for visible VRAM evictions

Mon Mar 27 09:55:34 UTC 2017

Am 27.03.2017 um 11:36 schrieb zhoucm1:
>
>
> On 2017年03月27日 17:29, Christian König wrote:
>> On APUs I've already enabled using direct access to the stolen parts 
>> of system memory.
> Thanks, could you point me out where is doing this?

See here gmc_v7_0_mc_init():
>         /* Could aper size report 0 ? */
>         adev->mc.aper_base = pci_resource_start(adev->pdev, 0);
>         adev->mc.aper_size = pci_resource_len(adev->pdev, 0);
>         /* size in MB on si */
>         adev->mc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 
> 1024ULL;
>         adev->mc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 
> 1024ULL;
>
> #ifdef CONFIG_X86_64
>         if (adev->flags & AMD_IS_APU) {
>                 adev->mc.aper_base = ((u64)RREG32(mmMC_VM_FB_OFFSET)) 
> << 22;
>                 adev->mc.aper_size = adev->mc.real_vram_size;
>         }
> #endif
>
We use the real physical address and size as aperture on APUs.

Similar code is in gmc_v8_0_mc_init().

Regards,
Christian.

>
> Regards,
> David Zhou
>>
>> So there won't be any eviction any more because of page faults on APUs.
>>
>> Regards,
>> Christian.
>>
>> Am 27.03.2017 um 09:53 schrieb Zhou, David(ChunMing):
>>> For APU special case, can we prevent eviction happening between VRAM 
>>> <----> GTT?
>>>
>>> Regards,
>>> David Zhou
>>>
>>> -----Original Message-----
>>> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On 
>>> Behalf Of Michel D?nzer
>>> Sent: Monday, March 27, 2017 3:36 PM
>>> To: Marek Olšák <maraeo at gmail.com>
>>> Cc: amd-gfx mailing list <amd-gfx at lists.freedesktop.org>
>>> Subject: Re: Plan: BO move throttling for visible VRAM evictions
>>>
>>> On 25/03/17 01:33 AM, Marek Olšák wrote:
>>>> Hi,
>>>>
>>>> I'm sharing this idea here, because it's something that has been
>>>> decreasing our performance a lot recently, for example:
>>>> http://openbenchmarking.org/prospect/1703011-RI-RADEONDIR06/7b7668cfc1
>>>> 09d1c3dc27e871c8aea71ca13f23fa
>>>>
>>>> I think the problem there is that Mesa git started uploading
>>>> descriptors and uniforms to VRAM, which helps when TC L2 has a low
>>>> hit/miss ratio, but the performance can randomly drop by an order of
>>>> magnitude. I've heard rumours that kernel 4.11 has an improved
>>>> allocator that should perform better, but the situation is still far
>>>> from ideal.
>>>>
>>>> AMD CPUs and APUs will hopefully suffer less, because we can resize
>>>> the visible VRAM with the help of our CPU hw specs, but Intel CPUs
>>>> will remain limited to 256 MB. The following plan describes how to do
>>>> throttling for visible VRAM evictions.
>>>>
>>>>
>>>> 1) Theory
>>>>
>>>> Initially, the driver doesn't care about where buffers are in VRAM,
>>>> because VRAM buffers are only moved to visible VRAM on CPU page faults
>>>> (when the CPU touches the buffer memory but the memory is in the
>>>> invisible part of VRAM). When it happens,
>>>> amdgpu_bo_fault_reserve_notify is called, which moves the buffer to
>>>> visible VRAM, and the app continues. amdgpu_bo_fault_reserve_notify
>>>> also marks the buffer as contiguous, which makes memory fragmentation
>>>> worse.
>>>>
>>>> I verified this with DiRT Rally where amdgpu_bo_fault_reserve_notify
>>>> was much higher in a CPU profiler than anything else in the kernel.
>>>>
>>>>
>>>> 2) Monitoring via Gallium HUD
>>>>
>>>> We need to expose 2 kernel counters via the INFO ioctl and display
>>>> those via Gallium HUD:
>>>> - The number of VRAM CPU page faults. (the number of calls to
>>>> amdgpu_bo_fault_reserve_notify).
>>>> - The number of bytes moved by ttm_bo_validate inside
>>>> amdgpu_bo_fault_reserve_notify.
>>>>
>>>> This will help us observe what exactly is happening and fine-tune the
>>>> throttling when it's done.
>>>>
>>>>
>>>> 3) Solution
>>>>
>>>> a) When amdgpu_bo_fault_reserve_notify is called, record the fact.
>>>> (amdgpu_bo::had_cpu_page_fault = true)
>>>>
>>>> b) Monitor the MB/s rate at which buffers are moved by
>>>> amdgpu_bo_fault_reserve_notify. If we get above a specific threshold,
>>>> don't move the buffer to visible VRAM. Move it to GTT instead. Note
>>>> that moving to GTT can be cheaper, because moving to visible VRAM is
>>>> likely to evict a lot of buffers there and unmap them from the CPU,
>>> FWIW, this can be avoided by only setting GTT in busy_placement. 
>>> Then TTM will only move the BO to visible VRAM if that can be done 
>>> without evicting anything from there.
>>>
>>>
>>>> but moving to GTT shouldn't evict or unmap anything.
>>>>
>>>> c) When we get into the CS ioctl and a buffer has had_cpu_page_fault,
>>>> it can be moved to VRAM if:
>>>> - the GTT->VRAM move rate is low enough to allow it (this is the
>>>> existing throttling mechanism)
>>>> - the visible VRAM move rate is low enough that we will be OK with
>>>> another CPU page fault if it happens.
>>> Some other ideas that might be worth trying:
>>>
>>> Evicting BOs to GTT instead of moving them to CPU accessible VRAM in 
>>> principle in some cases (e.g. for all BOs except those with
>>> AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) or even always.
>>>
>>> Implementing eviction from CPU visible to CPU invisible VRAM, 
>>> similar to how it's done in radeon. Note that there's potential for 
>>> userspace triggering an infinite loop in the kernel in cases where 
>>> BOs are moved back from invisible to visible VRAM on page faults.
>>>
>>>
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx