DMA-buf and uncached system memory

Mon Feb 15 11:53:32 UTC 2021

Am Montag, dem 15.02.2021 um 10:34 +0100 schrieb Christian König:
> 
> Am 15.02.21 um 10:06 schrieb Simon Ser:
> > On Monday, February 15th, 2021 at 9:58 AM, Christian König <christian.koenig at amd.com> wrote:
> > 
> > > we are currently working an Freesync and direct scan out from system
> > > memory on AMD APUs in A+A laptops.
> > > 
> > > On problem we stumbled over is that our display hardware needs to scan
> > > out from uncached system memory and we currently don't have a way to
> > > communicate that through DMA-buf.
> > > 
> > > For our specific use case at hand we are going to implement something
> > > driver specific, but the question is should we have something more
> > > generic for this?
> > > 
> > > After all the system memory access pattern is a PCIe extension and as
> > > such something generic.
> > Intel also needs uncached system memory if I'm not mistaken?
> 
> No idea, that's why I'm asking. Could be that this is also interesting 
> for I+A systems.
> 
> > Where are the buffers allocated? If GBM, then it needs to allocate memory that
> > can be scanned out if the USE_SCANOUT flag is set or if a scanout-capable
> > modifier is picked.
> > 
> > If this is about communicating buffer constraints between different components
> > of the stack, there were a few proposals about it. The most recent one is [1].
> 
> Well the problem here is on a different level of the stack.
> 
> See resolution, pitch etc:.. can easily communicated in userspace 
> without involvement of the kernel. The worst thing which can happen is 
> that you draw garbage into your own application window.
> 
> But if you get the caching attributes in the page tables (both CPU as 
> well as IOMMU, device etc...) wrong then ARM for example has the 
> tendency to just spontaneously reboot
> 
> X86 is fortunately a bit more gracefully and you only end up with random 
> data corruption, but that is only marginally better.
> 
> So to sum it up that is not something which we can leave in the hands of 
> userspace.
> 
> I think that exporters in the DMA-buf framework should have the ability 
> to tell importers if the system memory snooping is necessary or not.

There is already a coarse-grained way to do so: the dma_coherent
property in struct device, which you can check at dmabuf attach time.

However it may not be enough for the requirements of a GPU where the 
engines could differ in their dma coherency requirements. For that you
need to either have fake struct devices for the individual engines or
come up with a more fine-grained way to communicate those requirements.

> Userspace components can then of course tell the exporter what the 
> importer needs, but validation if that stuff is correct and doesn't 
> crash the system must happen in the kernel.

What exactly do you mean by "scanout requires non-coherent memory"?
Does the scanout requestor always set the no-snoop PCI flag, so you get
garbage if some writes to memory are still stuck in the caches, or is
it some other requirement?

Regards,
Lucas