DMA-BUFs always uncached on arm64, causing poor camera performance on Librem 5
Mikhail Rudenko
mike.rudenko at gmail.com
Sun Jul 13 19:54:14 UTC 2025
Hi, Pavel,
On 2025-07-10 at 10:24 +02, Pavel Machek <pavel at ucw.cz> wrote:
> [[PGP Signed Part:Undecided]]
> Hi!
>
> It seems that DMA-BUFs are always uncached on arm64... which is a
> problem.
>
> I'm trying to get useful camera support on Librem 5, and that includes
> recording vidos (and taking photos).
Earlier this year i tried to solve a similar issue on rkisp1 (Rockchip
3399), and done some measurements, showing that non-coherent buffers +
cache flushing for buffers is a viable approach [1]. Unfortunately, that
effort stalled, but maybe patch "[PATCH v4 1/2] media: videobuf2: Fix
dmabuf cache sync/flush in dma-contig" will be useful to you.
[1] https://lore.kernel.org/all/20250303-b4-rkisp-noncoherent-v4-0-e32e843fb6ef@gmail.com/
> memcpy() from normal memory is about 2msec/1MB. Unfortunately, for
> DMA-BUFs it is 20msec/1MB, and that basically means I can't easily do
> 760p video recording. Plus, copying full-resolution photo buffer takes
> more than 200msec!
>
> There's possibility to do some processing on GPU, and its implemented here:
>
> https://gitlab.com/tui/tui/-/tree/master/icam?ref_type=heads
>
> but that hits the same problem in the end -- data is in DMA-BUF,
> uncached, and takes way too long to copy out.
>
> And that's ... wrong. DMA ended seconds ago, complete cache flush
> would be way cheaper than copying single frame out, and I still have
> to deal with uncached frames.
>
> So I have two questions:
>
> 1) Is my analysis correct that, no matter how I get frame from v4l and
> process it on GPU, I'll have to copy it from uncached memory in the
> end?
>
> 2) Does anyone have patches / ideas / roadmap how to solve that? It
> makes GPU unusable for computing, and camera basically unusable for
> video.
>
> Best regards,
> Pavel
--
Best regards,
Mikhail Rudenko
More information about the dri-devel
mailing list