Support for 2D engines/blitters in V4L2 and DRM

Mon Apr 22 04:02:48 UTC 2019

On Sat, Apr 20, 2019 at 12:31 AM Nicolas Dufresne <nicolas at ndufresne.ca> wrote:
>
> Le vendredi 19 avril 2019 à 13:27 +0900, Tomasz Figa a écrit :
> > On Fri, Apr 19, 2019 at 9:30 AM Nicolas Dufresne <nicolas at ndufresne.ca> wrote:
> > > Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > > > It would be cool if both could be used concurrently and not just return
> > > > > -EBUSY when the device is used with the other subsystem.
> > > >
> > > > We live in this world already :-) I think there's even patches (or merged
> > > > already) to add fences to v4l, for Android.
> > >
> > > This work is currently suspended. It will require some feature on DRM
> > > display to really make this useful, but there is also a lot of
> > > challanges in V4L2. In GFX space, most of the use case are about
> > > rendering as soon as possible. Though, in multimedia we have two
> > > problems, we need to synchronize the frame rendering with the audio,
> > > and output buffers may comes out of order due to how video CODECs are
> > > made.
> > >
> > > In the first, we'd need a mechanism where we can schedule a render at a
> > > specific time or vblank. We can of course already implement this in
> > > software, but with fences, the scheduling would need to be done in the
> > > driver. Then if the fence is signalled earlier, the driver should hold
> > > on until the delay is met. If the fence got signalled late, we also
> > > need to think of a workflow. As we can't schedule more then one render
> > > in DRM at one time, I don't really see yet how to make that work.
> > >
> > > For the second, it's complicated on V4L2 side. Currently we signal
> > > buffers when they are ready in the display order. With fences, we
> > > receive early pairs buffer and fence (in decoding order). There exist
> > > cases where reordering is done by the driver (stateful CODEC). We
> > > cannot schedule these immediately we would need a new mechanism to know
> > > which one come next. If we just reuse current mechnism, it would void
> > > the fence usage since the fence will always be signalled by the time it
> > > reaches DRM or other v4l2 component.
> > >
> > > There also other issues, for video capture pipeline, if you are not
> > > rendering ASAP, you need the HW timestamp in order to schedule. Again,
> > > we'd get the fence early, but the actual timestamp will be signalled at
> > > the very last minutes, so we also risk of turning the fence into pure
> > > overhead. Note that as we speak, I have colleagues who are
> > > experimenting with frame timestamp prediction that slaves to the
> > > effective timestamp (catching up over time). But we still have issues
> > > when the capture driver skipped a frame (missed a capture window).
> >
> > Note that a fence has a timestamp internally and it can be queried for
> > it from the user space if exposed as a sync file:
> > https://elixir.bootlin.com/linux/v5.1-rc5/source/drivers/dma-buf/sync_file.c#L386
>
> Don't we need something the other way around ? This seems to be the
> timestamp of when it was triggered (I'm not familiar with this though).
>

Honestly, I'm not fully sure what this timestamp is expected to be.

For video capture pipeline the fence would signal once the whole frame
is captured, so I think it could be a reasonable value to consider
later in the pipeline?

> >
> > Fences in V4L2 would be also useful for stateless decoders and any
> > mem-to-mem processors that operate in order, like the blitters
> > mentioned here or actually camera ISPs, which can be often chained
> > into relatively sophisticated pipelines.
>
> I agree fence can be used to optimize specific corner cases. They are
> not as critical in V4L2 since we have async queues.

I wouldn't call those corner cases. A stateful decoder is actually one
of the opposite extremes, because one would normally just decode and
show the frame, so not much complexity needed to handle it and async
queues actually work quite well.

I don't think async queues are very helpful for any more complicated
use cases. The userspace still needs to wake up and push the buffers
through the pipeline. If you have some depth across the whole
pipeline, with queues always having some buffers waiting to be
processed, fences indeed wouldn't change too much (+/- the CPU
time/power wasted on context switches). However, with real time use
cases, such as anything involving streaming from cameras, image
processing stages and encoding into a stream to be passed to a
latency-sensitive application, such as WebRTC, the latency imposed by
the lack of fences would be significant. Especially if the image
processing in between consists of several inter-dependent stages.

> I think the use
> case for fences in V4L2 is mostly to lower the latency. Not all use
> cases requires such a low latency.

Indeed, not all, but I think it doesn't make fences less important,
given that there are use cases that require such a low latency.

> There was argument around fences
> that is simplify the the code, I haven't seen a compelling argument
> demonstrating that this would be the case for V4L2 programming. The
> only case is when doing V4L2 to DRM exchanges, and only in the context
> where time synchronization does not matter.

Another huge use case would be Android. The lack of fences is a
significant show stopper for V4L2 adoption there.

Also, V4L2 to GPU (GLES, Vulkan) exchange should not be forgotten too.

> In fact, so far it is more
> work since information starts flowing through separate events
> (buffer/fence first, later timestamps and possibly critical metadata.
> This might be induced by the design, but clearly there is a slight API
> clash.

Well, nothing is perfect from the start. (In fact, probably nothing is
perfect in general. ;))

Best regards,
Tomasz