[RFC PATCH 2/2] vc4: introduce DMA-BUF heap

Simon Ser contact at emersion.fr
Fri Nov 10 11:01:37 UTC 2023


On Thursday, November 9th, 2023 at 19:38, Dave Stevenson <dave.stevenson at raspberrypi.com> wrote:

> Hi Simon
> 
> On Thu, 9 Nov 2023 at 17:46, Simon Ser <contact at emersion.fr> wrote:
> >
> > On Thursday, November 9th, 2023 at 16:42, Dave Stevenson <dave.stevenson at raspberrypi.com> wrote:
> >
> > > > > - What would be a good name for the heap? "vc4" is maybe a bit naive and
> > > > >   not precise enough. Something with "cma"? Do we need to plan a naming
> > > > >   scheme to accomodate for multiple vc4 devices?
> > > >
> > > > That's a general issue though that happens with pretty much all devices
> > > > with a separate node for modesetting and rendering, so I don't think
> > > > addressing it only for vc4 make sense, we should make it generic.
> > > >
> > > > So maybe something like "scanout"?
> > > >
> > > > One thing we need to consider too is that the Pi5 will have multiple
> > > > display nodes (4(?) iirc) with different capabilities, vc4 being only
> > > > one of them. This will impact that solution too.
> > >
> > > It does need to scale.
> > >
> > > Pi5 adds 4 additional DRM devices (2xDSI, 1xDPI, and 1xComposite
> > > Video), and just this last week I've been running Wayfire with TinyDRM
> > > drivers for SPI displays and UDL (DisplayLink) outputs as well.
> > > Presumably all of those drivers need to have appropriate hooks added
> > > so they each expose a dma-heap to enable scanout buffers to be
> > > allocated.
> >
> > I'm not sure this makes sense necessarily for all of these devices. For vc4 and
> > the 4 additional RPi 5 DRM devices, probably. For the rest, e.g. UDL, I'm not
> > sure it makes sense to expose scanout memory allocation via DMA heaps: AFAIK
> > UDL needs CPU access to the buffers, it can't "scanout", and thus directly
> > rendering via v3d to a "scanout-capable" buffer doesn't make sense. There will
> > be a copy (CPU download) either way, and allocating via v3d wouldn't make a
> > difference.
> 
> You as a developer may know that UDL is going to need CPU access, but
> how does a generic userspace app know? Is it a case of falling back to
> allocating via the renderer if there is no suitably named heap?

Yeah, just like how things are working today. Then try to do direct scanout
with the buffer allocated on the render node, and if it doesn't work blit into
a DRM dumb buffer.

> > Indeed, there is nothing vc4-specific in this patch, the only requirement is
> > that the driver uses drm_gem_dma_helper. So this code could be moved into (or
> > alongside) that helper in DRM core. However, maybe it would be best to wait to
> > have a second user for this infrastructure before we move into core.
> 
> Upstreaming of the DSI / DPI / composite drivers for Pi5 should only
> be a few months away, and they can all directly scanout.
> 
> I expect the Rockchip platforms to also fall into the same category as
> the Pi, with Mali as the 3D IP, and the VOP block as the scanout
> engine. Have I missed some detail that means that they aren't a second
> user for this?
> 
> > > > > - Need to add !vc5 support.
> > > >
> > > > If by !vc5 you mean RPi0-3, then it's probably not needed there at all
> > > > since it has a single node for both modesetting and rendering?
> > >
> > > That is true, but potentially vc4 could be rendering for scanout via UDL or SPI.
> > > Is it easier to always have the dma-heap allocator for every DRM card
> > > rather than making userspace mix and match depending on whether it is
> > > all in one vs split?
> >
> > I don't think it's realistic to try to always make DMA heaps available for each
> > and every driver which might need it from day one. It's too big of a task. And
> > it's an even bigger task to try to design a fully generic heap compatibility
> > uAPI from day one. I'd much rather add the heaps one by one, if and when we
> > figure that it makes sense, and incrementally work our way through.
> 
> Is it really that massive a task? We have the dma heap UAPI for
> handling the allocations, so what new UAPI is required?

I'm only focused on fixing v3d for now. Some split render/display drivers do
similar things, some drivers not. I don't have hardware to test, I'm not
willing to commit a lot of time writing a patchset I'm not even sure will make
it. I don't really understand the hurry of doing all of the things at once here.
If you do happen to have the time, feel free to write the patches or even take
over this series.

I'm worried about blindly adding heaps all over the place, because as said
above, I don't believe it makes sense for all drivers. I think it's probably a
safe bet to add heaps in cases where we use kmsro with dumb buffers on the Mesa
side. But anything else and it really needs to be discussed on a case-by-case
basis.

> If it's a new function pointer in struct drm_driver, then the heap is
> only created by the core if that function is defined using the driver
> name. The function returns a struct dma_buf *.
> Any driver using drm_gem_dma_helper can use the new helper function
> that is basically your vc4_dma_heap_allocate. The "if
> (WARN_ON_ONCE(!vc4->is_vc5))" could be handled by not setting the
> function pointer on those platforms.

I would advise against a function pointer, because that kind of API design
makes DRM core more of a midlayer than a helper library, and this is something
we prefer to avoid [1]. I would instead suggest introducing a DRM core function
which creates the heap and can be called from the driver. 1-line vs. 3-line
in driver code doesn't really make such a big difference.

[1]: https://blog.ffwll.ch/2016/12/midlayers-once-more-with-feeling.html


More information about the dri-devel mailing list