[Mesa-dev] [PATCH 1/2] gallium: Add PIPE_CAP_USER_MEMORY_PAGE_SIZE for page size of user pointers
Christian König
deathsimple at vodafone.de
Thu Aug 17 13:01:43 UTC 2017
Am 17.08.2017 um 13:54 schrieb Jan Vesely:
> On Thu, 2017-08-17 at 11:54 +0200, Christian König wrote:
>> [SNIP]
>> In general ATS works completely different to GPUVM and is rather bound
>> to the CPU page tables.
>>
>> But GPUVM on everything before Vega10 has a so called fragmentation size
>> in their page table entries which tell the TLB that a certain bunch of
>> them are consecutive and so only one of them needs to be fetched and cached.
> did pre-Vega GPUVM have the x86 style multilevel (4-5) structure of
> page tables?
No, not even remotely. GPUVM page tables on pre-Vega can only deal with
two levels, some blocks like display can even only handle one or start
to run into problems.
> could fragmentation size go above the limit of one level?
I think so, but I never confirmed with the hardware guys. The maximum
fragment size is 1 or 2GB IIRC and that's normally way larger than a
single page table.
>> After Vega10 we more or less have the same as on x86_64 CPUs where you
>> set a bit in the page directory entry to stop the fetcher and use that
>> address instead. This way you not only make the TLB much faster, but
>> also save the last layer in the page table tree.
> I assumed most of the benefits of large pages came from increased TLB
> coverage. Do shorted page table walks bring significant performance
> impact?
Never measured it, but I would strongly assume so. See when you can skip
the last level of a page table walk in a four level tree you basically
make each full walk 25% more efficient.
> Does it mean that GPUVM does not have PTW prefix caches?
Vega10 does have a cache for page directory entries, but we have seen
significant improvement when we stopped to use that and instead used the
L2 with 2MB pages.
> sorry for the flurry of questions. I never looked into how GPUVM worked
> and assumed it used design choices tailored to benefit graphics
> workloads. The "cpu-ization" of the address translation hierarchy is
> rather interesting.
Well it's certainly a hot topic, cause it can affect memory throughput
significantly.
Regards,
Christian.
More information about the mesa-dev
mailing list