DMA-BUFs always uncached on arm64, causing poor camera performance on Librem 5
Lucas Stach
l.stach at pengutronix.de
Thu Jul 10 08:42:02 UTC 2025
Hi Pavel,
Am Donnerstag, dem 10.07.2025 um 10:24 +0200 schrieb Pavel Machek:
> Hi!
>
> It seems that DMA-BUFs are always uncached on arm64... which is a
> problem.
>
> I'm trying to get useful camera support on Librem 5, and that includes
> recording vidos (and taking photos).
>
> memcpy() from normal memory is about 2msec/1MB. Unfortunately, for
> DMA-BUFs it is 20msec/1MB, and that basically means I can't easily do
> 760p video recording. Plus, copying full-resolution photo buffer takes
> more than 200msec!
>
> There's possibility to do some processing on GPU, and its implemented here:
>
> https://gitlab.com/tui/tui/-/tree/master/icam?ref_type=heads
>
> but that hits the same problem in the end -- data is in DMA-BUF,
> uncached, and takes way too long to copy out.
>
> And that's ... wrong. DMA ended seconds ago, complete cache flush
> would be way cheaper than copying single frame out, and I still have
> to deal with uncached frames.
>
> So I have two questions:
>
> 1) Is my analysis correct that, no matter how I get frame from v4l and
> process it on GPU, I'll have to copy it from uncached memory in the
> end?
If you need to touch the buffers using the CPU then you are either
stuck with uncached memory or you need to implement bracketed access to
do the necessary cache maintenance. Be aware that completely flushing
the cache is not really an option, as that would impact other
workloads, so you have to flush the cache by walking the virtual
address space of the buffer, which may take a significant amount of CPU
time.
However, if you are only going to use the buffer with the GPU I see no
reason to touch it from the CPU side. Why would you even need to copy
the content? After all dma-bufs are meant to enable zero-copy between
DMA capable accelerators. You can simply import the V4L2 buffer into a
GL texture using EGL_EXT_image_dma_buf_import. Using this path you
don't need to bother with the cache at all, as the GPU will directly
read the video buffers from RAM.
Regards,
Lucas
>
> 2) Does anyone have patches / ideas / roadmap how to solve that? It
> makes GPU unusable for computing, and camera basically unusable for
> video.
>
> Best regards,
> Pavel
More information about the dri-devel
mailing list