[RFC PATCH 2/2] vc4: introduce DMA-BUF heap

Simon Ser contact at emersion.fr
Thu Nov 16 15:53:20 UTC 2023


On Thursday, November 9th, 2023 at 08:45, Simon Ser <contact at emersion.fr> wrote:

> User-space sometimes needs to allocate scanout-capable memory for
> GPU rendering purposes. On a vc4/v3d split render/display SoC, this
> is achieved via DRM dumb buffers: the v3d user-space driver opens
> the primary vc4 node, allocates a DRM dumb buffer there, exports it
> as a DMA-BUF, imports it into the v3d render node, and renders to it.
> 
> However, DRM dumb buffers are only meant for CPU rendering, they are
> not intended to be used for GPU rendering. Primary nodes should only
> be used for mode-setting purposes, other programs should not attempt
> to open it. Moreover, opening the primary node is already broken on
> some setups: systemd grants permission to open primary nodes to
> physically logged in users, but this breaks when the user is not
> physically logged in (e.g. headless setup) and when the distribution
> is using a different init (e.g. Alpine Linux uses openrc).
> 
> We need an alternate way for v3d to allocate scanout-capable memory.
> Leverage DMA heaps for this purpose: expose a CMA heap to user-space.

So we've discussed about this patch on IRC [1] [2]. Some random notes:

- We shouldn't create per-DRM-device heaps in general. Instead, we should try
  using centralized heaps like the existing system and cma ones. That way other
  drivers (video, render, etc) can also link to these heaps without depending
  on the display driver.
- We can't generically link to heaps in core DRM, however we probably provide
  a default for shmem and cma helpers.
- We're missing a bunch of heaps, e.g. sometimes there are multiple cma areas
  but only a single cma heap is created right now.
- Some hw needs the memory to be in a specific region for scanout (e.g. lower
  256MB of RAM for Allwinner). We could create one heap per such region (but is
  it fine to have overlapping heaps?).

Also I tried using the default CMA heap on a Pi 4 for scanout and it works fine.
Not super sure it's strictly equivalent to allocations done via dumb buffers
(e.g. WC etc).

[1]: https://oftc.irclog.whitequark.org/dri-devel/2023-11-13#1699899003-1699919633;
[2]: https://oftc.irclog.whitequark.org/dri-devel/2023-11-14


More information about the dri-devel mailing list