vaapidecode GPU to CPU download performance

Dan Williams dwilliams at cernium.com
Mon Nov 24 13:06:26 PST 2014


Hi all,

I want to use gstreamer and VAAPI to do accelerated H.264 decoding of
1920x1080 video frames into memory where I can do subsequent analysis
with the CPU. I am able to use the luma plane of the resulting I420
format buffer for what I need to do.

I am quite happy with the decode performance but getting the video
from the GPU to the CPU is a bottleneck and I'd like to get some
advice on how to improve that. Both latency and CPU usage are issues.
I need better performance because I want to process many streams of
video at the same time.

My example program is attached. The pipeline is: filesrc ! qtdemux ! vaapidecode ! appsink

If I run the program to pull each sample from a file with 18000 frames
as quickly as possible (but not actually gst_buffer_map the resulting
buffer) I get:

$ /usr/bin/time ./test-appsink ../media/hd-30m.mp4 0
10.21user 27.15system 0:49.60elapsed 75%CPU

If I then run with the same input but map the buffer from each
sample I get:

$ /usr/bin/time ./test-appsink ../media/hd-30m.mp4 1
19.55user 38.73system 2:28.15elapsed 39%CPU

I get 55% of my CPU in the wait state (according to top) in this case.

I can subtract the two results and get the performance of the
gst_buffer_map operation itself:

2:28.15 - 0:49.60 = 98.55s / 18000 frames = 5.5ms / frame or 546MB/s
(since each frame ~= 3MB)

When I use oprofile I see that 44% of the time spent is in the routine
drm_clflush_page:

samples  %        image name               symbol name
-------------------------------------------------------------------------------
363013   44.1111  /lib/modules/3.13.0-24-generic/updates/dkms/drm.ko drm_clflush_page

See http://lxr.free-electrons.com/source/drivers/gpu/drm/drm_cache.c?v=3.13

I am interested in knowing:

  1) can I make this run faster and use less CPU? how?
  2) ultimately, how much faster can I make it run?
  3) how much faster would it be with a faster CPU or GPU?

My hardware is:

CPU: Intel Atom E3815
GPU: HD 2500 (Ivy Bridge)

I am using gstreamer and gstreamer-vaapi built from git master branch
as of today, so 1.5.X I guess.

The rest of the software stack is:

Ubuntu 14.04.1 LTS
$ uname -a
Linux nuc-atom-testsys 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ vainfo
libva info: VA-API version 0.35.1
vainfo: VA-API version: 0.35 (libva 1.3.1)
vainfo: Driver version: Intel i965 driver for Intel(R) Bay Trail - 1.3.2

ii  libva-dev:amd64                                       1.3.1-3
ii  libdrm2:amd64                                         2.4.54-1
ii  i965-va-driver:amd64                                  1.3.2-1
ii  xserver-common                                        2:1.15.1-0ubuntu2.1
ii  xserver-xorg-video-intel                              2:2.99.911-0intel1

The input file is 30 minutes of 10fps 1920x1080 H.264 video which I
can make available if that helps.

Thanks in advance for any help (or even just for reading to the end of
this information-dense post.)

Dan

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test-appsink.c
URL: <http://lists.freedesktop.org/archives/gstreamer-devel/attachments/20141124/1acf4091/attachment.c>


More information about the gstreamer-devel mailing list