Image overlay over a video stream

Fri Nov 16 09:41:04 UTC 2018

Hello,

Am 15.11.2018 um 16:56 schrieb Nicolas Dufresne:
> Le jeudi 15 novembre 2018 à 09:46 +0100, Wolfgang Grandegger a écrit :
>> Hello,
>>
>> digging deeper about text, image and graphics overlay...
>>
>> Am 18.10.2018 um 17:55 schrieb Nicolas Dufresne:
>>> Le jeudi 18 octobre 2018 à 16:21 +0200, Wolfgang Grandegger a écrit :
>>>> Hello,
>>>>
>>>> I'm currently evaluating the following pipeline to receive, display
>>>> and
>>>> record a MJPEG video stream:
>>>>
>>>>   # nice -20 \
>>>>     gst-launch-1.0 -v \
>>>>     udpsrc port=50004 buffer-size=180000000 do-timestamp=1 \
>>>>       caps="application/x-rtp, media=(string)video, clock-
>>>> rate=(int)90000, \
>>>>       encoding-name=(string)JPEG, payload=(int)26,
>>>> framerate=(fraction)50/1" \
>>>>     ! rtpjitterbuffer latency=20 \
>>>>     ! rtpjpegdepay \
>>>>     ! vaapijpegdec \
>>>>     ! timeoverlay \
>>>>     ! tee name=t
>>>>     t. ! queue ! vaapisink
>>>>     t. ! queue ! vaapih264enc ! mp4mux ! filesink
>>>> location=/tmp/test.mp4
>>>>
>>>> The CPU usage of the various threads with and without the element
>>>> "timeoverlay" is listed below:
>>>>
>>>>     with timeoverlay ->  no     yes 
>>>>   TID  Thread Name      CPU %  CPU %
>>>>   ----------------------------------
>>>>   542  gst-launch-1.0    42.0   88.8 
>>>>   550  queue0:src         0.2    0.4
>>>>   549  vaapiencodeh264    5.3    5.5
>>>>   548  gmain              0.0    0.0
>>>>   547  udpsrc0:src        8.9    8.4
>>>>   546  rtpjitterbuffer    8.5   51.6
>>>>   545  timer              0.0    0.0
>>>>   544  queue1:src        14.8   17.6
>>>>   543  queue0:src         3.9    4.9
>>>>
>>>> The "timeoverlay" adds approx. 45% to the CPU load (max is 4 x 100%).
>>>> What does take that much CPU time?
>>>
>>> It's called software rendering (with anti-aliasing and all). There is
>>> also a hit because you need to download/upload the pixels from/to the
>>> GPU.
>>
>> I see! The text needs to be rendered and inserted into (overlayed with)
>> the video frame. The CPU usage really depends how often and what is
>> rendered. Already disabling shadow or outline drawing or a smaller font
>> reduces the CPU load.
>>
>> BTW, is it possible to specify the colour for the shaded background? I
>> know that it could be achieved with Pango "<span>" text attributes, e.g.
>> "bgcolor", but it requires more CPU time than the "shaded" background
>> from the text overlay.
> 
> It looks like you can specify font and outline color, but shade or the
> shadow color. Would be nice to add properties for this.

OK, while adding color to the shadow is trivial, it's getting more
complicated/complex for the shade...

> 
> The shaded background is done using plain cairo, I'm not sure why it's
> faster.

On Intel, we are in the NV12 color space. As I see it, the shading is
done on the raw video frame here:

  https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/blob/master/ext/pango/gstbasetextoverlay.c#L2020

Just the luminance (Y) is decremented. With color more data need to
be updated, taking more CPU time. Maybe that's the reason why it has
not been implemented.

>>
>>>> Is there a faster way to do the overlay or is that element from the
>>>> Pango plugin already quiet efficient?
>>>> I want to overlay an image over the video stream, ideally done by
>>>> the graphics hardware.
>>>
>>> There is an active effort to enable GL rendering of CompositonOverlay
>>> meta, but we don't have a fast method to import back GL textures into
>>> VAAPI encoder iirc. So there is still quite some work.
>>
>> Is this work in progress visible somewhere, e.g. as GIT repo?
> 
> https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/blob/master/gst/overlaycomposition/gstoverlaycomposition.c
> https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/blob/master/ext/gl/gstgloverlaycompositorelement.c

OK.
>> I realized, that the i.MX6 GStreamer-IMX Plugin [1] does have a somehow
>> optimized implementation of the "textoverlay" using 2D acceleration.
>> Would that approach be feasible and help on Intel graphics hardware as well?
> 
> On Intel, it is probably simpler to use GL. In any case, the fonts are
> still rendered in software and using cairo. I have for a long time been
> thinking that we could take advantage of bitmap font-cache, but never
> got the time to work on it.
> 
> On IMX2, there is a 2D blitter that could be used to implement an
> overlay-compositor.
> 
>>
>> Another option for text, image and graphics overlay is to use Cairo
>> directly using the "cairooverlay". That would allow to use the Cairo
>> text renderer and also add graphics or images in one process. Would that
>> be "lighter" or more efficient? I think GStreamer 0.1 did have a
>> "cairotextoverlay".
> 
> Text overlay extracts the vector path from pango and use cairo to
> render. Using cairo instead of bitmap font cache is the second
> performance hit (but only if you change the text every frames).
> Blending with the YUV video is the biggest performance hit. The
> blending is optimized using GL implementation. I haven't tested it, but
> it's likely adding an imxoverlaycompositor to gstreamer-imx would do
> the job will less work.

OK.

> 
> I guess I should implement such an element for mainline kernel too.
> Adding this to my todos.
> 
>>
>> [1] https://github.com/Freescale/gstreamer-imx/src/g2d/pango/

Thanks,

Wolfgang.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/gstreamer-devel/attachments/20181116/a0dc5f0e/attachment.sig>