Efficient scaling and/or conversion from YUV420 (NV12) to RGBA

Iñigo Huguet inigohuguet at fanamoel.com
Tue Sep 4 12:56:33 UTC 2018


I've found that it may be possible to use gstreamer-vaapi with vdpau as 
backend for my device (not tested yet).

Would this approach help in any way? Can Gstreamer-vaapi help me to 
improve performance in NV12 to RGBA conversion, or in video downscaling?

Reminder: I'm trying to stream NV12 video from cameras to a QT program 
using qmlglsink, and my device is an Allwinner A20 with VPU and mali GPU.


El 03/09/18 a las 15:33, Iñigo Huguet escribió:
>
> Hi. I asked the quoted question a month ago, but I've not been able to 
> work on it for a month. Now I'm back on it, so sorry for resuming 
> after a long time.
>
> El 02/08/18 a las 17:42, Nicolas Dufresne escribió:
>> Le jeudi 02 août 2018 à 15:03 +0200, Iñigo Huguet a écrit :
>>> Hi Nicolas,
>>>
>>> My kernel and Mali blob are quite old: kernel 3.4, mali blob r3p0. Do
>>> you know if with this versions the problem is, as you say, unsupported
>>> DMABuf importation? When graphics and video acceleration are available
>>> in mainline, and other things we need as well, we are planning to move
>>> to mainline, but for the moment it's not possible.
>>>
>>> Do you really think that the bottleneck is glupload? With this pipeline
>>> I get over 25fps:
>>>
>>> v4l2src device=/dev/video1 !
>>> video/x-raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload
>>> ! fakesink silent=false
>> Interesting. Then I don't know why shaders are being so slow, might
>> also be qml ?
>
> Doesn't seem to be the case, with this pipeline I also get the poor 
> performance:
> gst-launch-1.0 -v v4l2src device="/dev/video1" ! 
> video/x-raw,format=NV12,width=1440,height=1152,framerate=25/1 ! 
> glupload ! glcolorconvert ! "video/x-raw(memory:GLMemory),format=RGBA" 
> !  fakesink silent=false
>
> Apparently, the bottleneck is in glcolorconvert. With it, processing 
> time for a frame is around 0.5s (2fps), without it it's 0.04s (25fps).
>
>>> About v4l2src producing non-cache-able memory, I don't know what do you
>>> mean with that. The driver is producing buffers in dma-contig memory.
>> dma-contig produce non-cacheable memory. Any CPU access will be slow.
>> With OpenGL it's fun, since you never know when CPU access will happen
>> (emulation taking place).
>>
>>> I've just tried the pipeline you suggest, and the performance is almost
>>> the same.
>>>
>>> Any ideas?
>> probably because it's CMA memory ? You'll have to profile in order to
>> identify the problem. Even very old kernel have CPU counter that let
>> you use "perf" command.
>
> I don't know how to do this, can you point me a tutorial?
>
> Also, given the result of the pipeline I say above, do you think I 
> need to use this? It's not clear that converting to RGBA with 
> glcolorconvert is the bottleneck?
>
>>> El 02/08/18 a las 14:25, Nicolas Dufresne escribió:
>>>> Le jeudi 02 août 2018 à 11:31 +0200, Iñigo Huguet a écrit :
>>>>> Hi.
>>>>> I'm using a pipeline to display live video from cameras to a QT
>>>>> application. Cameras' driver produces NV12 video, and for QT I'm
>>>>> using qmlglsink.
>>>>> Element qmlglsink seems to only accept RGBA, so I have to make the
>>>>> conversion. I'm doing it with this pipeline: v4l2src
>>>>> device="/dev/video1" ! video/x-
>>>>> raw,format=NV12,width=1440,height=1152,framerate=5/1 ! glupload !
>>>>> glcolorconvert ! qmlglsink sync=false
>>>>> However, I'm getting very poor performance, around 1fps or less, and
>>>>> glcolorconvert seems to be the bottleneck. With this pipeline I get
>>>>> 25 fps with no problem: v4l2src device="/dev/video1" ! video/x-
>>>>> raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload !
>>>>> fakesink silent=false
>>>>> With 720x576 video I'm getting a better performance (obvious), but I
>>>>> need to use also 1440x1152 because this is video from the 4 cameras
>>>>> at the same time.
>>>>> Possible solutions that might be acceptable for me:
>>>>> More efficient way of converting from NV12 to RGBA
>>>>> Efficient way of scale down to 720x576, or even less, before color
>>>>> conversion
>>>>> Two previous options at the same time
>>>>> Other solutions you might suggest
>>>>> I'm running this on an ARM processor (Allwinner A20) with GPU and
>>>>> OpenGLES. This processor also have a Video Processing Unit that works
>>>>> with VDPAU.
>>>> VAAPI support is being worked own for this processor, through the new
>>>> Cedar kernel drivers. My guess for the performance, your Mali blob does
>>>> not support DMABuf importation, or not the way glupload implements it.
>>>> The bottleneck in that context is likely glupload, specially if your
>>>> v4l2src produce non-cache-able memory.
>>>>
>>>> If you are not running on battery, you could probably concert to RGBA
>>>> before glupload, using software converter.
>>>>
>>>>     v4l2src ! videoconvert n-threads=2 ! queue ! video/x-raw,fromat=RGBA ! glupload ! qmlglsink
>>>>
>>>>> Thanks
>>>>> Iñigo
>>>>> _______________________________________________
>>>>> gstreamer-devel mailing list
>>>>> gstreamer-devel at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
>>>> _______________________________________________
>>>> gstreamer-devel mailing list
>>>> gstreamer-devel at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
>>> _______________________________________________
>>> gstreamer-devel mailing list
>>> gstreamer-devel at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
>>>
>>>
>>> _______________________________________________
>>> gstreamer-devel mailing list
>>> gstreamer-devel at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/gstreamer-devel/attachments/20180904/d6151e1f/attachment.html>


More information about the gstreamer-devel mailing list