[PATCH 3/4] dma-buf: add support for mapping with dma mapping attributes

Tue Jan 22 22:50:00 UTC 2019

On Tue, 22 Jan 2019, Andrew F. Davis wrote:

> On 1/21/19 4:12 PM, Liam Mark wrote:
> > On Mon, 21 Jan 2019, Christoph Hellwig wrote:
> > 
> >> On Mon, Jan 21, 2019 at 11:44:10AM -0800, Liam Mark wrote:
> >>> The main use case is for allowing clients to pass in 
> >>> DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance 
> >>> which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In 
> >>> ION the buffers aren't usually accessed from the CPU so this allows 
> >>> clients to often avoid doing unnecessary cache maintenance.
> >>
> >> This can't work.  The cpu can still easily speculate into this area.
> > 
> > Can you provide more detail on your concern here.
> > The use case I am thinking about here is a cached buffer which is accessed 
> > by a non IO-coherent device (quite a common use case for ION).
> > 
> > Guessing on your concern:
> > The speculative access can be an issue if you are going to access the 
> > buffer from the CPU after the device has written to it, however if you 
> > know you aren't going to do any CPU access before the buffer is again 
> > returned to the device then I don't think the speculative access is a 
> > concern.
> > 
> >> Moreover in general these operations should be cheap if the addresses
> >> aren't cached.
> >>
> > 
> > I am thinking of use cases with cached buffers here, so CMO isn't cheap.
> > 
> 
> These buffers are cacheable, not cached, if you haven't written anything
> the data wont actually be in cache. 

That's true

> And in the case of speculative cache
> filling the lines are marked clean. In either case the only cost is the
> little 7 instruction loop calling the clean/invalidate instruction (dc
> civac for ARMv8) for the cache-lines. Unless that is the cost you are
> trying to avoid?
> 

This is the cost I am trying to avoid and this comes back to our previous 
discussion.  We have a coherent system cache so if you are doing this for 
every cache line on a large buffer it adds up with this work and the going 
to the bus.
For example I believe 1080P buffers are 8MB, and 4K buffers are even 
larger.

I also still think you would want to solve this properly such that 
invalidates aren't being done unnecessarily.

> In that case if you are mapping and unmapping so much that the little
> CMO here is hurting performance then I would argue your usage is broken
> and needs to be re-worked a bit.
> 

I am not sure I would say it is broken, the large buffers (example 1080P 
buffers) are mapped and unmapped on every frame. I don't think there is 
any clean way to avoid that in a pipelining framework, you could ask 
clients to keep the buffers dma mapped but there isn't necessarily a good 
time to tell them to unmap. 

It would be unfortunate to not consider this something legitimate for 
usespace to do in a pipelining use case.
Requiring devices to stay attached doesn't seem very clean to me as there 
isn't necessarily a nice place to tell them when to detach.

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project