DMA-BUFs always uncached on arm64, causing poor camera performance on Librem 5
Pavel Machek
pavel at ucw.cz
Thu Jul 10 08:49:19 UTC 2025
Hi!
> > memcpy() from normal memory is about 2msec/1MB. Unfortunately, for
> > DMA-BUFs it is 20msec/1MB, and that basically means I can't easily do
> > 760p video recording. Plus, copying full-resolution photo buffer takes
> > more than 200msec!
> >
> > There's possibility to do some processing on GPU, and its implemented here:
> >
> > https://gitlab.com/tui/tui/-/tree/master/icam?ref_type=heads
> >
> > but that hits the same problem in the end -- data is in DMA-BUF,
> > uncached, and takes way too long to copy out.
> >
> > And that's ... wrong. DMA ended seconds ago, complete cache flush
> > would be way cheaper than copying single frame out, and I still have
> > to deal with uncached frames.
> >
> > So I have two questions:
> >
> > 1) Is my analysis correct that, no matter how I get frame from v4l and
> > process it on GPU, I'll have to copy it from uncached memory in the
> > end?
>
> If you need to touch the buffers using the CPU then you are either
> stuck with uncached memory or you need to implement bracketed access to
> do the necessary cache maintenance. Be aware that completely flushing
> the cache is not really an option, as that would impact other
> workloads, so you have to flush the cache by walking the virtual
> address space of the buffer, which may take a significant amount of CPU
> time.
What kind of "significant amount of CPU time" are we talking here?
Millisecond?
Bracketed access is fine with me.
Flushing a cache should be an option. I'm root, there's no other
significant workload, and copying out the buffer takes 200msec+. There
are lot of cache flushes that can be done in quarter a second!
> However, if you are only going to use the buffer with the GPU I see no
> reason to touch it from the CPU side. Why would you even need to copy
> the content? After all dma-bufs are meant to enable zero-copy between
> DMA capable accelerators. You can simply import the V4L2 buffer into a
> GL texture using EGL_EXT_image_dma_buf_import. Using this path you
> don't need to bother with the cache at all, as the GPU will directly
> read the video buffers from RAM.
Yes, so GPU will read video buffer from RAM, then debayer it, and then
what? Then I need to store a data into raw file, or use CPU to turn it
into JPEG file, or maybe run video encoder on it. That are all tasks
that are done on CPU...
Best regards,
Pavel
--
I don't work for Nazis and criminals, and neither should you.
Boycott Putin, Trump, and Musk!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20250710/57f04d32/attachment.sig>
More information about the dri-devel
mailing list