[gst-devel] How to decrease CPU consumation for audio recording?
gibrovacco at gmail.com
Thu Oct 7 19:20:56 CEST 2010
On Thu, Oct 7, 2010 at 6:56 PM, Felipe Contreras <felipe.contreras at gmail.com
My claim was that GStreamer was bad for small buffers; the smaller, the
> worst. That IMO is a fact. Now, how small, and and how bad GStreamer is
> depends on your system, my guess was that ARM was
> specially worst compared to x86. I think the numbers show that.
In the uncountable times I've been profiling the VoIP (and video) call on
arm I found a perfect match with Felipe's finding: the smaller the buffers,
the higher the overhead on the system. In the pipelines of
telepathy-stream-engine, where imho there's plenty of unneeded elements (for
instance, we don't need resampling/converting the audio buffers, but there
are always at least two audio converters and one resampler) the change of
CPU load between 60ms to 20ms packetisaztion is about 20% (try with Skype to
believe), mostly located into the kernel, but also inside the udpsink/udpsrc
and rtpbin. Maybe I could add a few diagrams to Felipe's once I retrieve my
data, but I've some interesting considerations in the meanwhile..
Now, in a perfect world the overhead generated from GStreamer when handling
audio data should be O(n) wrt the amount of data, and O(1) wrt its
packetisation. Since we know that (de)payloading is an expensive operation,
I could still understand an algorithm which degrades with O(n) with the
number of buffers, but Felipe's diagrams are clearly showing that the
degradation is O(e^n) which grows faster than any polynomial function and,
as they teach at the university, is bad (and Felipe's fiagram don't have
neither payloaders nor rtp elements).
> My "overly dramatic" graphs show the raw data for the most minimal example
> I could find, so it doesn't matter what you do, you'll get _at least_ that
> performance hit. On real use-cases (in the graph after 2^7), IMO the
> performance lost is already bad, but you have to
> multiply that by the amount of different elements and thread contexts that
> are used.
Just to confirm this, I'd like to publish a mean stream-engine audio
pipeline and the CPU growth with different packetisations. Again, I hope to
be able and take a few pictures from the laptop @ work.
As it appears the most of the CPU growth is in the kernel (which doesn't
seem to happen on x86) I believe something weird is going on with fast
futexes on ARM. That is: the less mutexes, the less exponential CPU growth.
> However, the empirical experience is already there, ask
> anyone in Nokia, I just wanted to show raw numbers.
> > It sounds like when you mean size, you really mean duration and thus the
> > amount of buffers per second.
> > GStreamer is not designed to pass around 1 sample per buffer (that would
> > be typically 48000 buffers per second), you can do it but it will incur
> > a higher overhead that increases with the amount of elements in the
> > pipeline.
see my comments above: do you really think O(e^n) is a reasonable growth?
> > GStreamer is however designed for more realistic buffer durations of
> > 10ms (that's 100 buffers per second). The overhead that GStreamer causes
> > in these types of pipelines depends on a lot of things, but in well
> > designed pipelines you typically see overhead values of around 1% or
> > less (callgrind and kcachegrind are good tools to measure this).
The growth Felipe is showing happens as well with stream-engine pipelines,
and a similar one has been measured with quite simpler ones, like the
modified for audio-only and e.g. using g711 alaw. You can even test it with
g729 on any architectures now ;)
> > As a datapoint: On my desktop I can push around 700000 buffers per
> > second, and that's then using 100% CPU (and also 100% gstreamer
> > overhead). (gst-launch fakesrc num-buffers=7000000 silent=1 ! fakesink
> > silent=1 takes about 10 seconds).
It appears ARM is not as much optimised as x86 wrt fast futexes (no
references here :\, I have to dig more..) this meaning that GStreamer is not
well optimised for that architecture. It would be interesting to propose an
alternative way for read/write conflicts than bare mutexes.
> On my laptop:
> % gst-launch fakesrc num-buffers=7000000 silent=1 ! fakesink silent=1
> % gst-launch fakesrc num-buffers=7000000 silent=1 ! queue ! fakesink
> On my N900:
> % gst-launch-0.10 fakesrc num-buffers=7000000 silent=1 ! fakesink silent=1
> 4m 26s
> % gst-launch-0.10 fakesrc num-buffers=7000000 silent=1 ! queue !
> fakesink silent=1
> 16m 11s
This is more or less an experimental confirmation of my statements above on
ARM vs x86.
> Felipe Contreras
> Beautiful is writing same markup. Internet Explorer 9 supports
> standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3.
> Spend less time writing and rewriting code and more time creating great
> experiences on the web. Be a part of the beta today.
> gstreamer-devel mailing list
> gstreamer-devel at lists.sourceforge.net
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the gstreamer-devel