Support for 2D engines/blitters in V4L2 and DRM

Thu Apr 18 08:18:16 UTC 2019

On Wed, Apr 17, 2019 at 08:10:15PM +0200, Paul Kocialkowski wrote:
> Hi Nicolas,
> 
> I'm detaching this thread from our V4L2 stateless decoding spec since
> it has drifted off and would certainly be interesting to DRM folks as
> well!
> 
> For context: I was initially talking about writing up support for the
> Allwinner 2D engine as a DRM render driver, where I'd like to be able
> to batch jobs that affect the same destination buffer to only signal
> the out fence once when the batch is done. We have a similar issue in
> v4l2 where we'd like the destination buffer for a set of requests (each
> covering one H264 slice) to be marked as done once the set was decoded.
> 
> Le mercredi 17 avril 2019 à 12:22 -0400, Nicolas Dufresne a écrit :
> > > > > Interestingly, I'm experiencing the exact same problem dealing with a
> > > > > 2D graphics blitter that has limited ouput scaling abilities which
> > > > > imply handlnig a large scaling operation as multiple clipped smaller
> > > > > scaling operations. The issue is basically that multiple jobs have to
> > > > > be submitted to complete a single frame and relying on an indication
> > > > > from the destination buffer (such as a fence) doesn't work to indicate
> > > > > that all the operations were completed, since we get the indication at
> > > > > each step instead of at the end of the batch.
> > > > 
> > > > That looks similar to the IMX.6 IPU m2m driver. It splits the image in
> > > > tiles of 1024x1024 and process each tile separately. This driver has
> > > > been around for a long time, so I guess they have a solution to that.
> > > > They don't need requests, because there is nothing to be bundled with
> > > > the input image. I know that Renesas folks have started working on a
> > > > de-interlacer. Again, this kind of driver may process and reuse input
> > > > buffers for motion compensation, but I don't think they need special
> > > > userspace API for that.
> > > 
> > > Thanks for the reference! I hope it's not a blitter that was
> > > contributed as a V4L2 driver instead of DRM, as it probably would be
> > > more useful in DRM (but that's way beside the point).
> > 
> > DRM does not offer a generic and discoverable interface for these
> > accelerators. Note that these drivers have most of the time started as
> > DRM driver and their DRM side where dropped. That was the case for
> > Exynos drivers at least.
> 
> Heh, sadly I'm aware of how things turn out most of the time. The thing
> is that DRM expects drivers to implement their own interface. That's
> fine for passing BOs with GPU bitstream and textures, but not so much
> for dealing with framebuffer-based operations where the streaming and
> buffer interface that v4l2 has is a good fit.
> 
> There's also the fact that the 2D pipeline is fixed-function and highly
> hardware-specific, so we need driver-specific job descriptions to
> really make the most of it. That's where v4l2 is not much of a good fit
> for complex 2D pipelines either. Most 2D engines can take multiple
> inputs and blit them together in various ways, which is too far from
> what v4l2 deals with. So we can have fixed single-buffer pipelines with
> at best CSC and scaling, but not much more with v4l2 really.
> 
> I don't think it would be too much work to bring an interface to DRM in
> order to describe render framebuffers (we only have display
> framebuffers so far), with a simple queuing interface for scheduling
> driver-specific jobs, which could be grouped together to only signal
> the out fences when every buffer of the batch was done being rendered.
> This last point would allow handling cases where userapce need to
> perform multiple operations to carry out the single operation that it
> needs to do. In the case of my 2D blitter, that would be scaling above
> a 1024x1024 destination, which could be required to scaling a video
> buffer up to a 1920x1080 display. With that, we can e.g. page flip the
> 2D engine destination buffer and be certain that scaling will be fully
> done when the fence is signaled.
> 
> There's also the userspace problem: DRM render has mesa to back it in
> userspace and provide a generic API for other programes. For 2D
> engines, we don't have much to hold on to. Cairo has a DRM render
> interface that supports a few DRM render drivers where there is either
> a 2D pipeline or where pre-built shaders are used to implement a 2D
> pipeline, and that's about it as far as I know.
> 
> There's also the possibility of writing up a drm-render DDX to handle
> these 2D blitters that can make things a lot faster when running a
> desktop environment. As for wayland, well, I don't really know what to
> think. I was under the impression that it relies on GL for 2D
> operations, but am really not sure how true that actually is.

Just fyi in case you folks aren't aware, I typed up a blog a while ago
about why drm doesn't have a 2d submit api:

https://blog.ffwll.ch/2018/08/no-2d-in-drm.html

> > The thing is that DRM is great if you do immediate display stuff, while
> > V4L2 is nice if you do streaming, where you expect filling queued, and
> > popping buffers from queues.
> > 
> > In the end, this is just an interface, nothing prevents you from making
> > an internal driver (like the Meson Canvas) and simply letting multiple
> > sub-system expose it. Specially that some of these IP will often
> > support both signal and memory processing, so they equally fit into a
> > media controller ISP, a v4l2 m2m or a DRM driver.
> 
> Having base drivers that can hook to both v4l2 m2m and DRM would
> definitely be awesome. Maybe we could have some common internal
> synchronization logic to make writing these drivers easier.

We have, it's called dma_fence. Ties into dma_bufs using
reservation_objecsts.

> It would be cool if both could be used concurrently and not just return
> -EBUSY when the device is used with the other subsystem.

We live in this world already :-) I think there's even patches (or merged
already) to add fences to v4l, for Android.

> Anyway, that's my 2 cents about the situation and what we can do to
> improve it. I'm definitely interested in tackling these items, but it
> may take some time before we get there. Not to mention we need to
> rework media/v4l2 for per-slice decoding support ;)
> 
> > Another driver you might want to look is Rockchip RGA driver (which is
> > a multi function IP, including blitting).
> 
> Yep, I've aware of it as well. There's also vivante which exposes 2D
> cores but I'm really not sure whether any function is actually
> implemented. 
> 
> OMAP4 and OMAP5 have a 2D engine that seems to be vivante as well from
> what I could find out, but it seems to only have blobs for bltsville
> and no significant docs.

Yeah that's the usual approach for drm 2d drivers: You have a bespoke
driver in userspace. Usually that means an X driver, but there's been talk
to pimp the hwc interface to make that _the_ 2d accel interface. There's
also fbdev ... *shudder*.

All of these options are geared towards ultimately displaying stuff on
screens, not pure m2m 2d accel.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch