gstreamer-vaapi performance decrease of 25% with USE_NATIVE_FORMATS 1

Gwenole Beauchesne gb.devel at gmail.com
Mon Aug 25 01:33:10 PDT 2014


Hi,

2014-08-21 14:50 GMT+02:00 Thomas Scheuermann <scheuermann at barco.com>:

> we have a performance decrease of 25% with the define in
> gstvaapivideomemory.c
> #define USE_NATIVE_FORMATS 1
>
> We run 6 pipelines in parallel:
> gst-launch-1.0 videotestsrc -e num-buffers=3000 !
> video/x-raw,width=1920,height=1080,framerate=30/1 ! videoconvert !
> vaapiencode_h264 ! queue ! mp4mux ! filesink location=test.mp4
>
> If gstreamer-vaapi is compiled with
> #define USE_NATIVE_FORMATS 0
> the pipelines run about 25% faster.
>
> What could be the reason for this?

By default, I420 formats are now being exposed to GStreamer pipelines.
If you have a videoconvert in there, the conversion from I420 to NV12
(required by the encoder) might be performed on CPU. vaapiencode_h264
should be able to handle conversions from I420 to NV12 through the GPU
just fine. Otherwise, you could try to insert a vaapipostproc element
instead of videoconvert?

I would have to test further though. I believe that using a temporary
VA image in linear format, that is exposed (map/unmap) for writing,
then doing GPU accelerated conversion (vaPutImage()) from that to a
native internal format (NV12 Y-tiled), would generally be better
performing than vaDeriveImage() + "direct" copy.

In general, this is not because an API allows for direct/zero-copy
that the underlying implementation down to kernel/HW requirements are
going to honour that. Invalidation of caches could also hurt
performance more.

Regards,
Gwenole.


More information about the gstreamer-devel mailing list