Video Low-latency plugin development

Thu Aug 3 02:25:29 UTC 2017

Le jeudi 03 août 2017 à 01:24 +0000, Lijia (George Lee, Euler) a
écrit :
> It is always interest of topic drawing public eyeball to develop low-
> latency video system, even though GST provides all means and utility
> to inspect element’s latency property as well as offer a good example
> to demonstrate design consideration for low-latency real-time use-
> case. But all these design spots is confine to GStreamer framework
> itself, for designing a well-suited plugin we also need to understand
> the system-wide technique that is very diversified and in nature the
> core points do not go beyond the following framework,

I have strong interest in adding support for sub-frame
encoding/decoding (mostly in encoded streams) support to GStreamer base
class. As you may be aware, this is already supported by GStreamer RTP
stack. OpenH264 library seems like a good base to experiment.

>  
> A. For either encoder or decoder, change processing-granularity from
> single frame into line level or slice level that is so called “sub-
> frame” video codec, in other words split a frame into a series of
> pieces then feed these pieces into pipeline, thus by means of intra-
> frame paralleling reduce overall latency

That aspect is were GStreamer Framework currently need some
improvement. As of now, both VideoEncoder and VideoDecoder expect full
frames on both side (encoded and decoded). Obviously, the exact method
of splitting the encoding depends on the codec. Slice is often
associated to H264/H265, depending on the encoder capabilities, you
should be able to split in 4 or 8 slices each frames. What we need on
the base class side is method and data structure to annotate and store
these slices, so we know how many slices (declared latency depends on
how many slice per frame is used) and we can keep track of which slices
are associate with which frames (so timestamp and duration can be set
properly). Notice that it does not remove the latency completly, it
simply brings the latency to less then a frame.

OpenH264 goes even further, as they also support receiving partial
frame buffer. This could serve to optimize latency between capture and
encoder over let's say a serial link (like USB). This though fall
outside of my current interest, since on the HW I work on, this will be
solved at lower level (DMA Fences). Though, it was dicussed among dev
the idea of having software fences mechnism, so we could push a
GstBuffer early, and signal when the content is available. Same method
could also be used to pass encoded slices to try and reduce the
overhead of pushing more GstBuffer (even though with only 4/8 slices,
this overhead is not that important).

>  
> B. During bitstream encoding no B frame configured with I-frame and
> P-frame encoded into GOP

This is already supported by most encoder were it make sense (VP8 does
not have the notion of B-Frames).

>  
> C. Remove any buffering between any consecutive pipeline processing
> stage so as to guarantee real-time bitstream pass-through with memory
> zero-copy implicited

Even if you have have queues to your GStreamer pipeline, those queues
in live pipeline won't fill unless you have manually configured higher
latency or if there is a latency difference between each sink elements.
I don't think there is any development needed for this aspect.

>  
> D. Frame reorder/lipsync/FRC feature that are adopted under regular
> situation MUST de disabled for saving processing time

This is the same as B, but generalized. Again, encoders offers lot of
control, it's up to the application to properly set this up.

>  
> E. In response to addressing worst networking condition frame
> dropping sometimes performed and always process the most recent frame

That's already how the RTP stack behaves with the exception that you
need to consider the data within the configured latency period rather
then the most recent frame. Naively considering just the most recent
frame can have important side effect on smoothness.

>  
> Roll back to GST plugin development for low-latency application, how
> to design LL-friendly pipeline and what are compact element suits
> forming pipeline? I hope to get community’s idea exchange and smart
> point inspiring me to reach to right destination.

In general, you first need to choose your technologies. An example,
compliant Transport Streams demuxers will perform poorly, even though
with some code tweaks and by ignoring some of the spec, you can achieve
 low latency. RTP is better suited, since it's designed with this in
mind.

On the internal of the transport side, some mechanism like
retransmission requires more latency tolerance to work properly. These
should likely be avoided. I suppose, even though not supported yet,
that forward error correction would be a good way to avoid additional
delays and keeping quality (it's also useful when doing multicast and
does not want to setup feedback channel for everyone in the pool).

regards,
Nicolas