vaapidecode GPU to CPU download performance
Dan Williams
dwilliams at cernium.com
Mon Nov 24 13:06:26 PST 2014
Hi all,
I want to use gstreamer and VAAPI to do accelerated H.264 decoding of
1920x1080 video frames into memory where I can do subsequent analysis
with the CPU. I am able to use the luma plane of the resulting I420
format buffer for what I need to do.
I am quite happy with the decode performance but getting the video
from the GPU to the CPU is a bottleneck and I'd like to get some
advice on how to improve that. Both latency and CPU usage are issues.
I need better performance because I want to process many streams of
video at the same time.
My example program is attached. The pipeline is: filesrc ! qtdemux ! vaapidecode ! appsink
If I run the program to pull each sample from a file with 18000 frames
as quickly as possible (but not actually gst_buffer_map the resulting
buffer) I get:
$ /usr/bin/time ./test-appsink ../media/hd-30m.mp4 0
10.21user 27.15system 0:49.60elapsed 75%CPU
If I then run with the same input but map the buffer from each
sample I get:
$ /usr/bin/time ./test-appsink ../media/hd-30m.mp4 1
19.55user 38.73system 2:28.15elapsed 39%CPU
I get 55% of my CPU in the wait state (according to top) in this case.
I can subtract the two results and get the performance of the
gst_buffer_map operation itself:
2:28.15 - 0:49.60 = 98.55s / 18000 frames = 5.5ms / frame or 546MB/s
(since each frame ~= 3MB)
When I use oprofile I see that 44% of the time spent is in the routine
drm_clflush_page:
samples % image name symbol name
-------------------------------------------------------------------------------
363013 44.1111 /lib/modules/3.13.0-24-generic/updates/dkms/drm.ko drm_clflush_page
See http://lxr.free-electrons.com/source/drivers/gpu/drm/drm_cache.c?v=3.13
I am interested in knowing:
1) can I make this run faster and use less CPU? how?
2) ultimately, how much faster can I make it run?
3) how much faster would it be with a faster CPU or GPU?
My hardware is:
CPU: Intel Atom E3815
GPU: HD 2500 (Ivy Bridge)
I am using gstreamer and gstreamer-vaapi built from git master branch
as of today, so 1.5.X I guess.
The rest of the software stack is:
Ubuntu 14.04.1 LTS
$ uname -a
Linux nuc-atom-testsys 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ vainfo
libva info: VA-API version 0.35.1
vainfo: VA-API version: 0.35 (libva 1.3.1)
vainfo: Driver version: Intel i965 driver for Intel(R) Bay Trail - 1.3.2
ii libva-dev:amd64 1.3.1-3
ii libdrm2:amd64 2.4.54-1
ii i965-va-driver:amd64 1.3.2-1
ii xserver-common 2:1.15.1-0ubuntu2.1
ii xserver-xorg-video-intel 2:2.99.911-0intel1
The input file is 30 minutes of 10fps 1920x1080 H.264 video which I
can make available if that helps.
Thanks in advance for any help (or even just for reading to the end of
this information-dense post.)
Dan
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test-appsink.c
URL: <http://lists.freedesktop.org/archives/gstreamer-devel/attachments/20141124/1acf4091/attachment.c>
More information about the gstreamer-devel
mailing list