[RFC PATCH v2] Utilize the PCI API in the TTM framework.

Tue Jan 11 10:28:57 PST 2011

On Tue, Jan 11, 2011 at 01:12:57PM -0500, Alex Deucher wrote:
> On Tue, Jan 11, 2011 at 11:59 AM, Konrad Rzeszutek Wilk
> <konrad.wilk at oracle.com> wrote:
> >> >> Another thing that I was thinking of is what happens if you have a
> >> >> huge gart and allocate a lot of coherent memory. Could that
> >> >> potentially exhaust IOMMU resources?
> >> >
> >> > <scratches his head>
> >> >
> >> > So the GART is in the PCI space in one of the BARs of the device right?
> >> > (We are talking about the discrete card GART, not the poor man AMD IOMMU?)
> >> > The PCI space is under the 4GB, so it would be considered coherent by
> >> > definition.
> >>
> >> GART is not a PCI BAR; it's just a remapper for system pages.  On
> >> radeon GPUs at least there is a memory controller with 3 programmable
> >> apertures: vram, internal gart, and agp gart.  You can map these
> >
> > To access it, ie, to program it, you would need to access the PCIe card
> > MMIO regions, right? So that would be considered in PCI BAR space?
> 
> yes, you need access to the mmio aperture to configure the gpu.  I was
> thinking you mean something akin the the framebuffer BAR only for gart
> space which is not the case.

Aaah, gotcha.
> 
> >
> >> resources whereever you want in the GPU's address space and then the
> >> memory controller takes care of the translation to off-board resources
> >> like gart pages.  On chip memory clients (display controllers, texture
> >> blocks, render blocks, etc.) write to internal GPU addresses.  The GPU
> >> has it's own direct connection to vram, so that's not an issue.  For
> >> AGP, the GPU specifies aperture base and size, and you point it to the
> >> bus address of gart aperture provided by the northbridge's AGP
> >> controller.  For internal gart, the GPU has a page table stored in
> >
> > I think we are just talking about the GART on the GPU, not the old AGP
> > GART.
> 
> Ok.  I just mentioned it for completeness.

<nods>
> 
> >
> >> either vram or uncached system memory depending on the asic.  It
> >> provides a contiguous linear aperture to GPU clients and the memory
> >> controller translates the transactions to the backing pages via the
> >> pagetable.
> >
> > So I think I misunderstood what is meant by 'huge gart'. That sounds
> > like linear address space provided by GPU. And hooking up a lot of coherent
> > memory (so System RAM) to that linear address space would be no different that what
> > is currently being done. When you allocate memory using page_alloc(GFP_DMA32)
> > and hook up that memory to the linear space you exhaust the same amount of
> > ZONE_DMA32 memory as if you were to use the PCI API. It comes from the same
> > pool, except that doing it from the PCI API gets you the bus address right
> > away.
> >
> 
> In this case GPU clients refers to the hw blocks on the GPU; they are
> the ones that see the contiguous linear aperture.  From the
> application's perspective, gart memory looks like any other pages.

<nods>. Those 'hw blocks' or 'gart memory' are in reality
just pages received via the 'alloc_page()' (before this patchset and 
also after this patchset) Oh wait, this 'hw blocks' or 'gart memory' can also
refer to the VRAM memory right? In which case that is not memory allocated via
'alloc_page' but using a different mechanism. Is TTM used then? If so how
do you stick those VRAM pages under its accounting rules? Or do the drivers
use some other mechanism for that that is dependent on each driver?