[PATCH v3 1/1] drm: msm: Replace dma_map_sg with dma_sync_sg*

Fri Nov 30 09:44:25 UTC 2018

On Fri, Nov 30, 2018 at 10:35:27AM +0100, Daniel Vetter wrote:
> > Whether the cache maintenance operation needs to actually do anything
> > or not is a function of `dev`. We can have some devices that are
> > coherent with CPU caches, and some that are not, on the same system.
> 
> So the thing is that the gpu driver knows this too. It fairly often can
> even decide at runtime (through surface state bits or gpu pte bits)
> whether to use coherent or non-coherent transactions. dma-api assuming
> that a given device is always coherent or non-coherent is one of the
> fundamental mismatches we have.
> 
> If you otoh need dev because there's some strange bridge caches you need
> to flush (I've never seen that, but who knows), that would be a diffeernt
> thing. All the bridge flushing I've seen is attached to the iommu though,
> so would be really a surprise if the cache management needs that too.

Strange bridge caches aren't the problem.  Outside of magic components
like SOCs integrated GPUs the issue is that a platform can wire up a
PCIe/AXI/etc bus either so that it is cache coherent, or not cache
coherent (does not snooping).  Drivers need to use the full DMA API
include dma_sync_* to cater for the non-coherent case, which will turn
into no-ops if DMA is coherent.

Now PCIe now has unsnooped transactions, which can be non-coherent even
if the bus would otherwise be coherent.  We have so far very much
ignored those in Linux (at least Linux in general, I know you guys have
some driver-local hacks), but if that use case, or similar ones for GPUs
on SOCs become common we need to find a solution.

One of the major issues here is that architectures that always are
DMA coherent might not even have the cache flushing instructions, or even
if they do we have not wired them up in the DMA code as we didn't need
them.

So what we'd need to support this properly is to do something like:

 - add new arch hook that allows an architecture to say it supports
   optional non-coherent at the arch level, and for a given device
 - wire up arch_sync_dma_for_{device,cpu} for those architectures that
   define it if they don't currently have it (e.g. x86)
 - add a new DMA_ATTR_* flag to opt into cache flushing even if the
   device declares it is otherwise coherent

And I'd really like to see that work driven by an actual user.