[RFC PATCH v2] Utilize the PCI API in the TTM framework.

Tue Jan 11 08:59:54 PST 2011

> >> Another thing that I was thinking of is what happens if you have a
> >> huge gart and allocate a lot of coherent memory. Could that
> >> potentially exhaust IOMMU resources?
> >
> > <scratches his head>
> >
> > So the GART is in the PCI space in one of the BARs of the device right?
> > (We are talking about the discrete card GART, not the poor man AMD IOMMU?)
> > The PCI space is under the 4GB, so it would be considered coherent by
> > definition.
> 
> GART is not a PCI BAR; it's just a remapper for system pages.  On
> radeon GPUs at least there is a memory controller with 3 programmable
> apertures: vram, internal gart, and agp gart.  You can map these

To access it, ie, to program it, you would need to access the PCIe card
MMIO regions, right? So that would be considered in PCI BAR space?

> resources whereever you want in the GPU's address space and then the
> memory controller takes care of the translation to off-board resources
> like gart pages.  On chip memory clients (display controllers, texture
> blocks, render blocks, etc.) write to internal GPU addresses.  The GPU
> has it's own direct connection to vram, so that's not an issue.  For
> AGP, the GPU specifies aperture base and size, and you point it to the
> bus address of gart aperture provided by the northbridge's AGP
> controller.  For internal gart, the GPU has a page table stored in

I think we are just talking about the GART on the GPU, not the old AGP
GART.

> either vram or uncached system memory depending on the asic.  It
> provides a contiguous linear aperture to GPU clients and the memory
> controller translates the transactions to the backing pages via the
> pagetable.

So I think I misunderstood what is meant by 'huge gart'. That sounds
like linear address space provided by GPU. And hooking up a lot of coherent
memory (so System RAM) to that linear address space would be no different that what
is currently being done. When you allocate memory using page_alloc(GFP_DMA32)
and hook up that memory to the linear space you exhaust the same amount of
ZONE_DMA32 memory as if you were to use the PCI API. It comes from the same
pool, except that doing it from the PCI API gets you the bus address right
away.