[Nouveau] [PATCH 6/6] mmu: gk20a: implement IOMMU mapping for big pages

Thu Apr 16 13:01:44 PDT 2015

On Thu, Apr 16, 2015 at 3:55 PM, Terje Bergstrom <tbergstrom at nvidia.com> wrote:
>
> On 04/16/2015 12:31 PM, Ilia Mirkin wrote:
>>
>> Two questions --
>>
>> (a) What's the perf impact of doing this? Less work for the GPU MMU
>> but more work for the IOMMU...
>> (b) Would it be a good idea to do this for desktop GPUs that are on
>> CPUs with IOMMUs in them (VT-d and whatever the AMD one is)? Is there
>> some sort of shared API for this stuff that you should be (or are?)
>> using?
>
> a) Using IOMMU mapping is the best way of getting contiguous post-GMMU
> address spaces. The continuity is required to be able to use frame buffer
> compression. So overall performance impact when compression is factored in
> is about 20-30%.
>
> If compression is left out of the equation, the impact SMMU translation and
> small versus large pages should not be noticeable, but I haven't measured
> it. We have measured large versus small pages with compression disabled in
> both cases in gk20a and the difference was noise.

Ah, I never made the connection to compression. I had assumed it was
something done at a higher level by PGRAPH rather than at the PTE
level by the VM. [I did know that you had to set compression at the
PTE level, but didn't think that page size mattered.]

>
> Additional advantage is extra protection against GPU accidentally walking
> over kernel memory if kernel driver has a bug.

Yeah, IOMMU's are nice :)