[Mesa-dev] [PATCH 1/2] gallium: Add PIPE_CAP_USER_MEMORY_PAGE_SIZE for page size of user pointers

Thu Aug 17 13:01:43 UTC 2017

Am 17.08.2017 um 13:54 schrieb Jan Vesely:
> On Thu, 2017-08-17 at 11:54 +0200, Christian König wrote:
>> [SNIP]
>> In general ATS works completely different to GPUVM and is rather bound
>> to the CPU page tables.
>>
>> But GPUVM on everything before Vega10 has a so called fragmentation size
>> in their page table entries which tell the TLB that a certain bunch of
>> them are consecutive and so only one of them needs to be fetched and cached.
> did pre-Vega GPUVM have the x86 style multilevel (4-5) structure of
> page tables?

No, not even remotely. GPUVM page tables on pre-Vega can only deal with 
two levels, some blocks like display can even only handle one or start 
to run into problems.

>   could fragmentation size go above the limit of one level?

I think so, but I never confirmed with the hardware guys. The maximum 
fragment size is 1 or 2GB IIRC and that's normally way larger than a 
single page table.

>> After Vega10 we more or less have the same as on x86_64 CPUs where you
>> set a bit in the page directory entry to stop the fetcher and use that
>> address instead. This way you not only make the TLB much faster, but
>> also save the last layer in the page table tree.
> I assumed most of the benefits of large pages came from increased TLB
> coverage. Do shorted page table walks bring significant performance
> impact?

Never measured it, but I would strongly assume so. See when you can skip 
the last level of a page table walk in a four level tree you basically 
make each full walk 25% more efficient.

> Does it mean that GPUVM does not have PTW prefix caches?

Vega10 does have a cache for page directory entries, but we have seen 
significant improvement when we stopped to use that and instead used the 
L2 with 2MB pages.

> sorry for the flurry of questions. I never looked into how GPUVM worked
>   and assumed it used design choices tailored to benefit graphics
> workloads. The "cpu-ization" of the address translation hierarchy is
> rather interesting.

Well it's certainly a hot topic, cause it can affect memory throughput 
significantly.

Regards,
Christian.