[RFC] Future TTM DMA direction

Tue Jan 10 07:10:48 PST 2012

On Mon, Jan 09, 2012 at 10:37:28AM +0100, Thomas Hellstrom wrote:
> Hi!
> 
> When TTM was originally written, it was assumed that GPU apertures
> could address pages directly, and that the CPU could access those
> pages without explicit synchronization. The process of binding a
> page to a GPU translation table was a simple one-step operation, and
> we needed to worry about fragmentation in the GPU aperture only.
> 
> Now that we "sort of" support DMA memory there are three things I
> think are missing:
> 
> 1) We can't gracefully handle coherent DMA OOMs or coherent DMA
> (Including CMA) memory fragmentation leading to failed allocations.

However most allocations are done in PAGE_SIZE chunks. So there aren't
any danger of contingous allocation failures. 

However, one way that the storage and network driver had solved this
was by doing a dmapool code concept. Which is pretty much what TTM DMA
is based on - that way we won't be hitting OOMs b/c we have allocated
a pool at the start. Well, OK, we can still hit OOMs if we want more DMA
buffers than the IOMMU can provide.

We could eleviate part of the problem by making the unbind/binding process
(and hence also the unpopulate/populate) happen more lazily to lower
the exhaustion problem?

> 2) We can't handle dynamic mapping of pages into and out of dma, and
> corresponding IOMMU space shortage or fragmentation, and CPU
> synchronization.

This and 1) seem to point to the same thing - a closer relationship
with the IOMMU/DMA code. I would think that this problem would not
just be with graphics, but also with storage, userspace drivers,
and network.

Seems that some form of feedback mechanism from IOMMU API might be useful?

> 3) We have no straightforward way of moving pages between devices.
> 
> I think a reasonable way to support this is to make binding to a
> non-fixed (system page based) TTM memory type a two-step binding
> process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE)
> instead of only (MEMORY_TYPE).
> 
> In step 1) the bo is bound to a specific DMA type. These could be
> for example:
> (NONE, DYNAMIC, COHERENT, CMA), .... device dependent types could be
> allowed as well.
> In this step, we perform dma_sync_for_device, or allocate
> dma-specific pages maintaining LRU lists so that if we receive a DMA
> memory allocation OOM, we can unbind bo:s bound to the same DMA

The DMA API is quite stringent in wanting the DMA page allocated to be
associated with the BDF of the device. So the "same DMA type" would
need to be "same DMA type on the same PCI device."

> type. Standard graphics cards would then, for example, use the NONE
> DMA type when run on bare metal or COHERENT when run on Xen. A
> "COHERENT" OOM condition would then lead to eviction of another bo.
> (Note that DMA eviction might involve data copies and be costly, but
> still better than failing).

OK, that sounds right - we do have those buffers and we could re-use them.
Thought right now we throw away the 'tt_cached' ones instead of re-using them.

> Binding with the DYNAMIC memory type would mean that CPU accesses
> are disallowed, and that user-space CPU page mappings might need to
> be killed, with a corresponding sync_for_cpu if they are faulted in
> again (perhaps on a page-by-page basis). Any attempt to bo_kmap() a
> bo page bound to DYNAMIC DMA mapping should trigger a BUG.
> 
> In step 2) The bo is bound to the GPU in the same way it's done
> today. Evicting from DMA will of course also trigger an evict from
> GPU, but an evict from GPU will not trigger a DMA evict.
> 
> Making a bo "anonymous" and thus moveable between devices would then
> mean binding it to the "NONE" DMA type.

Which would be copied to a different device when needed by another GPU?

The "binding" process sounds like it would need the smarts to figure out
whether it can just attach the DMA page to the other pool or if it needs
to fetch a page from the other pool, copy the contents of the page, and
retire the old one in a pool for re-use? And probably some other logic
too.

> 
> Comments, suggestions?
> 
> /Thomas
> 
> 
> 
> 
> 
>