[PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation

Daniel Vetter daniel at ffwll.ch
Thu Nov 12 09:32:37 UTC 2020


On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote:
> Hi John,
> 
> On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz at linaro.org> wrote:
> >
> > Hey All,
> >   So just wanted to send my last revision of my patch series
> > of performance optimizations to the dma-buf system heap.
> 
> Thanks very much for your patches - I think the first 5 patches look good to me.
> 
> I know there was a bit of discussion over adding a new system-uncached
> heap v/s using a flag to identify that; I think I prefer the separate
> heap idea, but lets ask one last time if any one else has any real
> objections to it.
> 
> Daniel, Christian: any comments from your side on this?

I do wonder a bit where the userspace stack for this all is, since tuning
allocators without a full stack is fairly pointless. dma-buf heaps is a
bit in a limbo situation here it feels like.

Plus I'm vary of anything related to leaking this kind of stuff beyond the
dma-api because dma api maintainers don't like us doing that. But
personally no concern on that front really, gpus need this. It's just that
we do need solid justification I think if we land this. Hence back to
first point.

Ideally first point comes in the form of benchmarking on android together
with a mesa driver (or mesa + some v4l driver or whatever it takes to
actually show the benefits, I have no idea).
-Daniel

> 
> I am planning to merge this series to drm-misc this week if I hear no
> objections.
> >
> > This series reworks the system heap to use sgtables, and then
> > consolidates the pagelist method from the heap-helpers into the
> > CMA heap. After which the heap-helpers logic is removed (as it
> > is unused). I'd still like to find a better way to avoid some of
> > the logic duplication in implementing the entire dma_buf_ops
> > handlers per heap. But unfortunately that code is tied somewhat
> > to how the buffer's memory is tracked. As more heaps show up I
> > think we'll have a better idea how to best share code, so for
> > now I think this is ok.
> >
> > After this, the series introduces an optimization that
> > Ørjan Eide implemented for ION that avoids calling sync on
> > attachments that don't have a mapping.
> >
> > Next, an optimization to use larger order pages for the system
> > heap. This change brings us closer to the current performance
> > of the ION allocation code (though there still is a gap due
> > to ION using a mix of deferred-freeing and page pools, I'll be
> > looking at integrating those eventually).
> >
> > Finally, a reworked version of my uncached system heap
> > implementation I was submitting a few weeks back. Since it
> > duplicated a lot of the now reworked system heap code, I
> > realized it would be much simpler to add the functionality to
> > the system_heap implementation itself.
> >
> > While not improving the core allocation performance, the
> > uncached heap allocations do result in *much* improved
> > performance on HiKey960 as it avoids a lot of flushing and
> > invalidating buffers that the cpu doesn't touch often.
> >
> > Feedback on these would be great!
> >
> > thanks
> > -john
> >
> > New in v5:
> > * Added a comment explaining why the order sizes are
> >   chosen as they are
> >
> > Cc: Sumit Semwal <sumit.semwal at linaro.org>
> > Cc: Liam Mark <lmark at codeaurora.org>
> > Cc: Laura Abbott <labbott at kernel.org>
> > Cc: Brian Starkey <Brian.Starkey at arm.com>
> > Cc: Hridya Valsaraju <hridya at google.com>
> > Cc: Suren Baghdasaryan <surenb at google.com>
> > Cc: Sandeep Patil <sspatil at google.com>
> > Cc: Daniel Mentz <danielmentz at google.com>
> > Cc: Chris Goldsworthy <cgoldswo at codeaurora.org>
> > Cc: Ørjan Eide <orjan.eide at arm.com>
> > Cc: Robin Murphy <robin.murphy at arm.com>
> > Cc: Ezequiel Garcia <ezequiel at collabora.com>
> > Cc: Simon Ser <contact at emersion.fr>
> > Cc: James Jones <jajones at nvidia.com>
> > Cc: linux-media at vger.kernel.org
> > Cc: dri-devel at lists.freedesktop.org
> >
> > John Stultz (7):
> >   dma-buf: system_heap: Rework system heap to use sgtables instead of
> >     pagelists
> >   dma-buf: heaps: Move heap-helper logic into the cma_heap
> >     implementation
> >   dma-buf: heaps: Remove heap-helpers code
> >   dma-buf: heaps: Skip sync if not mapped
> >   dma-buf: system_heap: Allocate higher order pages if available
> >   dma-buf: dma-heap: Keep track of the heap device struct
> >   dma-buf: system_heap: Add a system-uncached heap re-using the system
> >     heap
> >
> >  drivers/dma-buf/dma-heap.c           |  33 +-
> >  drivers/dma-buf/heaps/Makefile       |   1 -
> >  drivers/dma-buf/heaps/cma_heap.c     | 324 +++++++++++++++---
> >  drivers/dma-buf/heaps/heap-helpers.c | 270 ---------------
> >  drivers/dma-buf/heaps/heap-helpers.h |  53 ---
> >  drivers/dma-buf/heaps/system_heap.c  | 494 ++++++++++++++++++++++++---
> >  include/linux/dma-heap.h             |   9 +
> >  7 files changed, 753 insertions(+), 431 deletions(-)
> >  delete mode 100644 drivers/dma-buf/heaps/heap-helpers.c
> >  delete mode 100644 drivers/dma-buf/heaps/heap-helpers.h
> >
> > --
> > 2.17.1
> >
> Thanks much,
> 
> Best,
> Sumit.

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the dri-devel mailing list