>> >> Another thing that I was thinking of is what happens if you have a
>> >> huge gart and allocate a lot of coherent memory. Could that
>> >> potentially exhaust IOMMU resources?
>> >
>> > <scratches his head>
>> >
>> > So the GART is in the PCI space in one of the BARs of the device right?
>> > (We are talking about the discrete card GART, not the poor man AMD IOMMU?)
>> > The PCI space is under the 4GB, so it would be considered coherent by
>> > definition.
>> GART is not a PCI BAR; it's just a remapper for system pages.  On
>> radeon GPUs at least there is a memory controller with 3 programmable
>> apertures: vram, internal gart, and agp gart.  You can map these
> To access it, ie, to program it, you would need to access the PCIe card
> MMIO regions, right? So that would be considered in PCI BAR space?

yes, you need access to the mmio aperture to configure the gpu.  I was
thinking you mean something akin the the framebuffer BAR only for gart
space which is not the case.

>> resources whereever you want in the GPU's address space and then the
>> memory controller takes care of the translation to off-board resources
>> like gart pages.  On chip memory clients (display controllers, texture
>> blocks, render blocks, etc.) write to internal GPU addresses.  The GPU
>> has it's own direct connection to vram, so that's not an issue.  For
>> AGP, the GPU specifies aperture base and size, and you point it to the
>> bus address of gart aperture provided by the northbridge's AGP
>> controller.  For internal gart, the GPU has a page table stored in
> I think we are just talking about the GART on the GPU, not the old AGP

Ok.  I just mentioned it for completeness.

>> either vram or uncached system memory depending on the asic.  It
>> provides a contiguous linear aperture to GPU clients and the memory
>> controller translates the transactions to the backing pages via the
>> pagetable.
> So I think I misunderstood what is meant by 'huge gart'. That sounds
> like linear address space provided by GPU. And hooking up a lot of coherent
> memory (so System RAM) to that linear address space would be no different that what
> is currently being done. When you allocate memory using page_alloc(GFP_DMA32)
> and hook up that memory to the linear space you exhaust the same amount of
> ZONE_DMA32 memory as if you were to use the PCI API. It comes from the same
> pool, except that doing it from the PCI API gets you the bus address right
> away.

In this case GPU clients refers to the hw blocks on the GPU; they are
the ones that see the contiguous linear aperture.  From the
application's perspective, gart memory looks like any other pages.


