[RFC 4/4] drm: Add NVIDIA Tegra support

Thu Apr 12 01:40:01 PDT 2012

Hi Thierry,

On Thursday, April 12, 2012 9:18 AM Thierry Reding wrote:

> * Arnd Bergmann wrote:
> > On Wednesday 11 April 2012, Thierry Reding wrote:
> > > Daniel Vetter wrote:
> > > > Well, you use the iommu api to map/unmap memory into the iommu for tegra,
> > > > whereas usually device drivers just use the dma api to do that. The usual
> > > > interface is dma_map_sg/dma_unmap_sg, but there are quite a few variants
> > > > around. I'm just wondering why this you've choosen this.
> > >
> > > I don't think this works on ARM. Maybe I'm not seeing the whole picture but
> > > judging by a quick look through the kernel tree there aren't any users that
> > > map DMA memory through an IOMMU.
> >
> > dma_map_sg is certainly the right interface to use, and Marek Szyprowski has
> > patches to make that work on ARM, hopefully going into v3.5, so you could
> > use those.
> 
> I've looked at Marek's patches but I don't think they'll work for Tegra 2 or
> Tegra 3. The corresponding iommu_map() functions only set one PTE, regardless
> of the number of bytes passed to them. However, the Tegra TRM indicates that
> mapping needs to be done on a per-page basis so contiguous regions cannot be
> combined. I suppose the IOMMU driver would have to be fixed to program more
> than a single page in that case.

I assume you want to map a set of pages into contiguous chunk in io address space.
This can be done with dma_map_sg() call once IOMMU aware implementation has been
assigned to the given device. DMA-mapping implementation is able to merge 
consecutive chunks of the scatter list in the dma/io address space if possible
(i.e. there are no in-page offsets between the chunks). With my implementation 
of IOMMU aware dma-mapping you usually you get a single DMA chunk from the 
provided scatter-list.

I know that this approach causes a lot of confusion at the first look, but that
how dma mapping api has been designed. The scatter list based approach has some
drawbacks - it is a bit oversized for most of the typical use cases for the 
gfx/multimedia buffers, but that's all we have now. 

Scatter lists were initially designed for the disk based block io operations, 
hence the presence of the in-page offsets and lengths for each chunk. For 
multimedia use cases providing an array of struct pages and asking dma-mapping 
to map them into contiguous memory is probably all we need. I wonder if 
introducing such new calls is a good idea. Anrd, what do think? It will 
definitely simplify the drivers and improve the code understanding. On the 
other hand it requires a significant amount of work in the dma-mapping 
framework for all architectures, but that's not a big issue for me.

> Also this doesn't yet solve the vmap() problem that is needed for the kernel
> virtual mapping. I did try using dma_alloc_writecombine(), but that only
> works for chunks of 2 MB or smaller, unless I use init_consistent_dma_size()
> during board setup, which isn't provided for in a DT setup. I couldn't find
> a better alternative, but I admit I'm not very familiar with all the VM APIs.
> Do you have any suggestions on how to solve this? Otherwise I'll try and dig
> in some more.

Yes, I'm aware of this issue I'm currently working on solving it. I hope to use 
standard vmalloc range for all coherent/writecombine allocations and get rid of
the custom 'consistent_dma' region at all.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center