Efficient scaling and/or conversion from YUV420 (NV12) to RGBA

Iñigo Huguet inigohuguet at fanamoel.com
Mon Sep 3 13:33:04 UTC 2018


Hi. I asked the quoted question a month ago, but I've not been able to 
work on it for a month. Now I'm back on it, so sorry for resuming after 
a long time.

El 02/08/18 a las 17:42, Nicolas Dufresne escribió:
> Le jeudi 02 août 2018 à 15:03 +0200, Iñigo Huguet a écrit :
>> Hi Nicolas,
>>
>> My kernel and Mali blob are quite old: kernel 3.4, mali blob r3p0. Do
>> you know if with this versions the problem is, as you say, unsupported
>> DMABuf importation? When graphics and video acceleration are available
>> in mainline, and other things we need as well, we are planning to move
>> to mainline, but for the moment it's not possible.
>>
>> Do you really think that the bottleneck is glupload? With this pipeline
>> I get over 25fps:
>>
>> v4l2src device=/dev/video1 !
>> video/x-raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload
>> ! fakesink silent=false
> Interesting. Then I don't know why shaders are being so slow, might
> also be qml ?

Doesn't seem to be the case, with this pipeline I also get the poor 
performance:
gst-launch-1.0 -v v4l2src device="/dev/video1" ! 
video/x-raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload 
! glcolorconvert ! "video/x-raw(memory:GLMemory),format=RGBA" !  
fakesink silent=false

Apparently, the bottleneck is in glcolorconvert. With it, processing 
time for a frame is around 0.5s (2fps), without it it's 0.04s (25fps).

>
>> About v4l2src producing non-cache-able memory, I don't know what do you
>> mean with that. The driver is producing buffers in dma-contig memory.
> dma-contig produce non-cacheable memory. Any CPU access will be slow.
> With OpenGL it's fun, since you never know when CPU access will happen
> (emulation taking place).
>
>> I've just tried the pipeline you suggest, and the performance is almost
>> the same.
>>
>> Any ideas?
> probably because it's CMA memory ? You'll have to profile in order to
> identify the problem. Even very old kernel have CPU counter that let
> you use "perf" command.

I don't know how to do this, can you point me a tutorial?

Also, given the result of the pipeline I say above, do you think I need 
to use this? It's not clear that converting to RGBA with glcolorconvert 
is the bottleneck?

>>
>> El 02/08/18 a las 14:25, Nicolas Dufresne escribió:
>>> Le jeudi 02 août 2018 à 11:31 +0200, Iñigo Huguet a écrit :
>>>> Hi.
>>>> I'm using a pipeline to display live video from cameras to a QT
>>>> application. Cameras' driver produces NV12 video, and for QT I'm
>>>> using qmlglsink.
>>>> Element qmlglsink seems to only accept RGBA, so I have to make the
>>>> conversion. I'm doing it with this pipeline: v4l2src
>>>> device="/dev/video1" ! video/x-
>>>> raw,format=NV12,width=1440,height=1152,framerate=5/1 ! glupload !
>>>> glcolorconvert ! qmlglsink sync=false
>>>> However, I'm getting very poor performance, around 1fps or less, and
>>>> glcolorconvert seems to be the bottleneck. With this pipeline I get
>>>> 25 fps with no problem: v4l2src device="/dev/video1" ! video/x-
>>>> raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload !
>>>> fakesink silent=false
>>>> With 720x576 video I'm getting a better performance (obvious), but I
>>>> need to use also 1440x1152 because this is video from the 4 cameras
>>>> at the same time.
>>>> Possible solutions that might be acceptable for me:
>>>> More efficient way of converting from NV12 to RGBA
>>>> Efficient way of scale down to 720x576, or even less, before color
>>>> conversion
>>>> Two previous options at the same time
>>>> Other solutions you might suggest
>>>> I'm running this on an ARM processor (Allwinner A20) with GPU and
>>>> OpenGLES. This processor also have a Video Processing Unit that works
>>>> with VDPAU.
>>> VAAPI support is being worked own for this processor, through the new
>>> Cedar kernel drivers. My guess for the performance, your Mali blob does
>>> not support DMABuf importation, or not the way glupload implements it.
>>> The bottleneck in that context is likely glupload, specially if your
>>> v4l2src produce non-cache-able memory.
>>>
>>> If you are not running on battery, you could probably concert to RGBA
>>> before glupload, using software converter.
>>>
>>>     v4l2src ! videoconvert n-threads=2 ! queue ! video/x-raw,fromat=RGBA ! glupload ! qmlglsink
>>>
>>>> Thanks
>>>> Iñigo
>>>> _______________________________________________
>>>> gstreamer-devel mailing list
>>>> gstreamer-devel at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
>>> _______________________________________________
>>> gstreamer-devel mailing list
>>> gstreamer-devel at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
>> _______________________________________________
>> gstreamer-devel mailing list
>> gstreamer-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
>>
>>
>> _______________________________________________
>> gstreamer-devel mailing list
>> gstreamer-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/gstreamer-devel/attachments/20180903/529521ef/attachment.html>


More information about the gstreamer-devel mailing list