Tegra DRM device tree bindings

Thu Jun 28 10:01:29 PDT 2012

Hi Thierry,

Am Donnerstag, den 28.06.2012, 13:12 +0200 schrieb Thierry Reding:
> On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
> > Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
> > > On Wed, Jun 27, 2012 at 05:29:14PM +0300, Hiroshi Doyu wrote:
> > > > On Wed, 27 Jun 2012 16:08:10 +0200
> > > > Thierry Reding <thierry.reding at avionic-design.de> wrote:
> > > > 
> > > > > * PGP Signed by an unknown key
> > > > > 
> > > > > On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
> > > > > > On Wed, 27 Jun 2012 07:14:18 +0200
> > > > > > Thierry Reding <thierry.reding at avionic-design.de> wrote:
> > > > > > 
> > > > > > > > Old Signed by an unknown key
> > > > > > > 
> > > > > > > On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
> > > > > > > > On 06/26/2012 08:32 PM, Mark Zhang wrote:
> > > > > > > > >> On 06/26/2012 07:46 PM, Mark Zhang wrote:
> > > > > > > > >>>>> On Tue, 26 Jun 2012 12:55:13 +0200
> > > > > > > > >>>>> Thierry Reding <thierry.reding at avionic-design.de> wrote:
> > > > > > > > >> ...
> > > > > > > > >>>> I'm not sure I understand how information about the carveout would be
> > > > > > > > >>>> obtained from the IOMMU API, though.
> > > > > > > > >>>
> > > > > > > > >>> I think that can be similar with current gart implementation. Define carveout as:
> > > > > > > > >>>
> > > > > > > > >>> carveout {
> > > > > > > > >>>         compatible = "nvidia,tegra20-carveout";
> > > > > > > > >>>         size = <0x10000000>;
> > > > > > > > >>> };
> > > > > > > > >>>
> > > > > > > > >>> Then create a file such like "tegra-carveout.c" to get these definitions and
> > > > > > > > >> register itself as platform device's iommu instance.
> > > > > > > > >>
> > > > > > > > >> The carveout isn't a HW object, so it doesn't seem appropriate to define a DT
> > > > > > > > >> node to represent it.
> > > > > > > > > 
> > > > > > > > > Yes. But I think it's better to export the size of carveout as a configurable item.
> > > > > > > > > So we need to define this somewhere. How about define carveout as a property of gart?
> > > > > > > > 
> > > > > > > > There already exists a way of preventing Linux from using certain chunks
> > > > > > > > of memory; the /memreserve/ syntax. From a brief look at the dtc source,
> > > > > > > > it looks like /memreserve/ entries can have labels, which implies that a
> > > > > > > > property in the GART node could refer to the /memreserve/ entry by
> > > > > > > > phandle in order to know what memory regions to use.
> > > > > > > 
> > > > > > > Wasn't the whole point of using a carveout supposed to be a replacement
> > > > > > > for the GART?
> > > > > > 
> > > > > > Mostly agree. IIUC, we use both carveout/gart allocated buffers in
> > > > > > android/tegra2.
> > > > > > 
> > > > > > >As such I'd think the carveout should rather be a property
> > > > > > > of the host1x device.
> > > > > > 
> > > > > > Rather than introducing a new property, how about using
> > > > > > "coherent_pool=??M" in the kernel command line if necessary? I think
> > > > > > that this carveout size depends on the system usage/load.
> > > > > 
> > > > > I was hoping that we could get away with using the CMA and perhaps
> > > > > initialize it based on device tree content. I agree that the carveout
> > > > > size depends on the use-case, but I still think it makes sense to
> > > > > specify it on a per-board basis.
> > > > 
> > > > DRM driver doesn't know if it uses CMA or not, because DRM only uses
> > > > DMA API.
> > > 
> > > So how is the DRM supposed to allocate buffers? Does it call the
> > > dma_alloc_from_contiguous() function to do that? I can see how it is
> > > used by arm_dma_ops but how does it end up in the driver?
> > > 
> > As I said before the DMA API is not a good fit for graphics drivers.
> > Most of the DMA buffers used by graphics cores are long lived and big,
> > so we need a special pool to alloc from to avoid eating all contiguous
> > address space, as DMA API does not provide shrinker callbacks for
> > clients using large amount of memory.
> 
> I recall you mentioning TTM as a better alternative several times in the
> past. How does it fit in with this? Does it have the capability of using
> a predefined chunk of contiguous memory as a pool to allocate from?
> 
> One problem that all of these solutions don't address is that not all
> devices below host1x are DRM related. At least for the CSI and VI blocks
> I expect there to be V4L2 drivers eventually, so what we really need is
> to manage allocations outside of the DRM. host1x is the most logical
> choice here.

I think you are right here. We might want to move all those
buffer/memory management in the host1x code and provide contig memory to
the host1x clients from there.

TTM has the ability to manage a chunk of memory for contig allocations.
Also I think TTM does not depend too heavily on DRM, so we may even be
able to use TTM as the general allocator for host1x clients, including
VI and others. The more advanced stuff in TTM like swapping and moving
buffers might be a bit of overkill for simple stuff like V4L, where you
basically just want something like: "give me a contig buffer and pin it
in address space so it won't ever move", but it should do no harm.

> 
> Perhaps we can put host1x code somewhere below drivers/gpu (mm
> subdirectory?), drivers/memory or perhaps some other or new location
> that could eventually host similar drivers for other SoCs.
> 
> Then again, maybe it'd be easier for now to put everything below the
> drivers/gpu/drm/tegra directory and cross that bridge when we get to it.
> 
> > > > I think that "coherent_pool" can be used only when the amount of
> > > > contiguous memory is short in your system. Otherwise even unnecessary.
> > > > 
> > > > Could you explain a bit more why you want carveout size on per-board basis?
> > > 
> > > In the ideal case I would want to not have a carveout size at all.
> > > However there may be situations where you need to make sure some driver
> > > can allocate a given amount of memory. Having to specify this using a
> > > kernel command-line parameter is cumbersome because it may require
> > > changes to the bootloader or whatever. So if you know that a particular
> > > board always needs 128 MiB of carveout, then it makes sense to specify
> > > it on a per-board basis.
> > 
> > If we go with CMA, this is a non-issue, as CMA allows to use the contig
> > area for normal allocations and only purges them if it really needs the
> > space for contig allocs.
> 
> CMA certainly sounds like the most simple approach. While it may not be
> suited for 3D graphics or multimedia processing later on, I think we
> could use it at a starting point to get basic framebuffer and X support
> up and running. We can always move to something more advanced like TTM
> later.
> 
> Thierry