vaapidecode GPU to CPU download performance

Arun Raghavan arun at accosted.net
Wed Nov 26 20:20:57 PST 2014


On 25 November 2014 at 18:05, Gwenole Beauchesne <gb.devel at gmail.com> wrote:
> Hi Dan,
>
> 2014-11-24 22:06 GMT+01:00 Dan Williams <dwilliams at cernium.com>:
>> Hi all,
>>
>> I want to use gstreamer and VAAPI to do accelerated H.264 decoding of
>> 1920x1080 video frames into memory where I can do subsequent analysis
>> with the CPU. I am able to use the luma plane of the resulting I420
>> format buffer for what I need to do.
>>
>> I am quite happy with the decode performance but getting the video
>> from the GPU to the CPU is a bottleneck and I'd like to get some
>> advice on how to improve that. Both latency and CPU usage are issues.
>> I need better performance because I want to process many streams of
>> video at the same time.
>>
>> My example program is attached. The pipeline is: filesrc ! qtdemux ! vaapidecode ! appsink
>>
>> If I run the program to pull each sample from a file with 18000 frames
>> as quickly as possible (but not actually gst_buffer_map the resulting
>> buffer) I get:
>>
>> $ /usr/bin/time ./test-appsink ../media/hd-30m.mp4 0
>> 10.21user 27.15system 0:49.60elapsed 75%CPU
>>
>> If I then run with the same input but map the buffer from each
>> sample I get:
>>
>> $ /usr/bin/time ./test-appsink ../media/hd-30m.mp4 1
>> 19.55user 38.73system 2:28.15elapsed 39%CPU
>>
>> I get 55% of my CPU in the wait state (according to top) in this case.
>>
>> I can subtract the two results and get the performance of the
>> gst_buffer_map operation itself:
>>
>> 2:28.15 - 0:49.60 = 98.55s / 18000 frames = 5.5ms / frame or 546MB/s
>> (since each frame ~= 3MB)
>
> When you map the buffer, you get a GstVaapiSurfaceProxy, but what do
> you do with it next?
>
> Are you:
> 1. Using vaGetImage() + map the resulting pixels + direct read ; or
> 2. Using vaDeriveImage() + map buffer + use Uncacheable Speculative
> Write Combining (USWC) memory copy?

I just filed a bug about what looks like the same issue:
https://bugzilla.gnome.org/show_bug.cgi?id=740774

The summary is filesrc ! demux ! parse ! vaapidecode ! xvimagesink is
incredibly slow -- is that expected? I'm on a Ivybridge-based desktop.

Regards,
Arun


More information about the gstreamer-devel mailing list