<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:p="urn:schemas-microsoft-com:office:powerpoint" xmlns:a="urn:schemas-microsoft-com:office:access" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="#RowsetSchema" xmlns:b="urn:schemas-microsoft-com:office:publisher" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:c="urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:odc="urn:schemas-microsoft-com:office:odc" xmlns:oa="urn:schemas-microsoft-com:office:activation" xmlns:html="http://www.w3.org/TR/REC-html40" xmlns:q="http://schemas.xmlsoap.org/soap/envelope/" xmlns:rtc="http://microsoft.com/officenet/conferencing" xmlns:D="DAV:" xmlns:Repl="http://schemas.microsoft.com/repl/" xmlns:mt="http://schemas.microsoft.com/sharepoint/soap/meetings/" xmlns:x2="http://schemas.microsoft.com/office/excel/2003/xml" xmlns:ppda="http://www.passport.com/NameSpace.xsd" xmlns:ois="http://schemas.microsoft.com/sharepoint/soap/ois/" xmlns:dir="http://schemas.microsoft.com/sharepoint/soap/directory/" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:dsp="http://schemas.microsoft.com/sharepoint/dsp" xmlns:udc="http://schemas.microsoft.com/data/udc" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sub="http://schemas.microsoft.com/sharepoint/soap/2002/1/alerts/" xmlns:ec="http://www.w3.org/2001/04/xmlenc#" xmlns:sp="http://schemas.microsoft.com/sharepoint/" xmlns:sps="http://schemas.microsoft.com/sharepoint/soap/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:udcs="http://schemas.microsoft.com/data/udc/soap" xmlns:udcxf="http://schemas.microsoft.com/data/udc/xmlfile" xmlns:udcp2p="http://schemas.microsoft.com/data/udc/parttopart" xmlns:st="" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta http-equiv=Content-Type content="text/html; charset=us-ascii"> <meta name=Generator content="Microsoft Word 11 (filtered medium)">  <style>  </style> </head> <body lang=EN-US link=blue vlink=blue> <div class=Section1> > > My claim was that GStreamer was bad for small buffers; the smaller, the worst. That IMO is a fact.<o:p></o:p> <o:p> </o:p> Is it fair to say that this discussion really has nothing to do with the actual size of the buffers, but is really a matter of per-buffer overhead? <o:p></o:p> <o:p> </o:p> <o:p> </o:p> > Felipe's diagrams are clearly showing that the degradation is O(e^n)<o:p></o:p> <o:p> </o:p> Actually, that’s not clear to me, as his plot was log(x), y.  That’s why I asked about plotting throughput vs number of elements or queues.  Even using a linear x axis would be more enlightening.<o:p></o:p> <o:p> </o:p> <o:p> </o:p> I also agree with Wim that the effects of the queue are exaggerated in a trivial pipeline on an idle system.  In higher-load situations, you would tend to have fewer context switches, which are probably the largest cost.<o:p></o:p> <o:p> </o:p> I think a lockless queue wouldn’t help with this scenario, since you’d still want to wake up a consumer that’s waiting on an empty queue (which requires a lock + condition variable).  Where lockless helps is to scale throughput in higher load scenarios.<o:p></o:p> <o:p> </o:p> If you could afford some latency, then perhaps batching could be implemented by having the consumer block until the queue either reaches some watermark or a timeout expires.  When either of these conditions is met, the consumer empties out the queue and goes back to waiting.<o:p></o:p> <o:p> </o:p> <o:p> </o:p> Matt<o:p></o:p> <o:p> </o:p> <o:p> </o:p> <div> <div class=MsoNormal align=center style='text-align:center'> <hr size=2 width="100%" align=center tabindex=-1> </div> From: Marco Ballesio [mailto:gibrovacco@gmail.com] Sent: Thursday, October 07, 2010 13:21 To: Discussion of the development of GStreamer Subject: Re: [gst-devel] How to decrease CPU consumation for audio recording?<o:p></o:p> </div> <o:p> </o:p> Hi,<o:p></o:p> <div> On Thu, Oct 7, 2010 at 6:56 PM, Felipe Contreras <<a href="mailto:felipe.contreras@gmail.com">felipe.contreras@gmail.com</a>> wrote: ..snip..<o:p></o:p> My claim was that GStreamer was bad for small buffers; the smaller, the worst. That IMO is a fact. Now, how small, and and how bad GStreamer is depends on your system, my guess was that ARM was specially worst compared to x86. I think the numbers show that.<o:p></o:p> <div> In the uncountable times I've been profiling the VoIP (and video) call on arm I found a perfect match with Felipe's finding: the smaller the buffers, the higher the overhead on the system. In the pipelines of telepathy-stream-engine, where imho there's plenty of unneeded elements (for instance, we don't need resampling/converting the audio buffers, but there are always at least two audio converters and one resampler) the change of CPU load between 60ms to 20ms packetisaztion is about 20% (try with Skype to believe), mostly located into the kernel, but also inside the udpsink/udpsrc and rtpbin. Maybe I could add a few diagrams to Felipe's once I retrieve my data, but I've some interesting considerations in the meanwhile.. Now, in a perfect world the overhead generated from GStreamer when handling audio data should be O(n) wrt the amount of data, and O(1) wrt its packetisation. Since we know that (de)payloading is an expensive operation, I could still understand an algorithm which degrades with O(n) with the number of buffers, but Felipe's diagrams are clearly showing that the degradation is O(e^n) which grows faster than any polynomial function and, as they teach at the university, is bad (and Felipe's fiagram don't have neither payloaders nor rtp elements).<o:p></o:p> </div> <blockquote style='border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt; margin-left:4.8pt;margin-right:0in'> My "overly dramatic" graphs show the raw data for the most minimal example I could find, so it doesn't matter what you do, you'll get _at least_ that performance hit. On real use-cases (in the graph after 2^7), IMO the performance lost is already bad, but you have to multiply that by the amount of different elements and thread contexts that are used.<o:p></o:p> </blockquote> <div> Just to confirm this, I'd like to publish a mean stream-engine audio pipeline and the CPU growth with different packetisations. Again, I hope to be able and take a few pictures from the laptop @ work. As it appears the most of the CPU growth is in the kernel (which doesn't seem to happen on x86) I believe something weird is going on with fast futexes on ARM. That is: the less mutexes, the less exponential CPU growth.  <o:p></o:p> </div> <blockquote style='border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt; margin-left:4.8pt;margin-right:0in'> However, the empirical experience is already there, ask anyone in Nokia, I just wanted to show raw numbers.<o:p></o:p> <div> <o:p> </o:p> </div> </blockquote> <div> :)  <o:p></o:p> </div> <blockquote style='border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt; margin-left:4.8pt;margin-right:0in'> <div> > It sounds like when you mean size, you really mean duration and thus the > amount of buffers per second. > > GStreamer is not designed to pass around 1 sample per buffer (that would > be typically 48000 buffers per second), you can do it but it will incur > a higher overhead that increases with the amount of elements in the > pipeline.<o:p></o:p> </div> </blockquote> <div> see my comments above: do you really think O(e^n) is a reasonable growth?  <o:p></o:p> </div> <blockquote style='border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt; margin-left:4.8pt;margin-right:0in'> <div> > > GStreamer is however designed for more realistic buffer durations of > 10ms (that's 100 buffers per second). The overhead that GStreamer causes > in these types of pipelines depends on a lot of things, but in well > designed pipelines you typically see overhead values of around 1% or > less (callgrind and kcachegrind are good tools to measure this).<o:p></o:p> </div> </blockquote> <div> The growth Felipe is showing happens as well with stream-engine pipelines, and a similar one has been measured with quite simpler ones, like the examples on: <a href="http://www.gstreamer.net/data/doc/gstreamer/head/gst-plugins-good-plugins/html/gst-plugins-good-plugins-gstrtpbin.html">http://www.gstreamer.net/data/doc/gstreamer/head/gst-plugins-good-plugins/html/gst-plugins-good-plugins-gstrtpbin.html</a> modified for audio-only and e.g. using g711 alaw. You can even test it with g729 on any architectures now ;) ..snip..<o:p></o:p> </div> <blockquote style='border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt; margin-left:4.8pt;margin-right:0in'> <div> > > As a datapoint: On my desktop I can push around 700000 buffers per > second, and that's then using 100% CPU (and also 100% gstreamer > overhead). (gst-launch fakesrc num-buffers=7000000 silent=1 ! fakesink > silent=1 takes about 10 seconds).   <o:p></o:p> </div> </blockquote> <div> It appears ARM is not as much optimised as x86 wrt fast futexes (no references here :\, I have to dig more..) this meaning that GStreamer is not well optimised for that architecture. It would be interesting to propose an alternative way for read/write conflicts than bare mutexes.  <o:p></o:p> </div> <blockquote style='border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt; margin-left:4.8pt;margin-right:0in'> On my laptop: % gst-launch fakesrc num-buffers=7000000 silent=1 ! fakesink silent=1 22s % gst-launch fakesrc num-buffers=7000000 silent=1 ! queue ! fakesink silent=1 45s On my N900: % gst-launch-0.10 fakesrc num-buffers=7000000 silent=1 ! fakesink silent=1 4m 26s % gst-launch-0.10 fakesrc num-buffers=7000000 silent=1 ! queue ! fakesink silent=1 16m 11s<o:p></o:p> </blockquote> <div> This is more or less an experimental confirmation of my statements above on ARM vs x86. Regards  <o:p></o:p> </div> <blockquote style='border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt; margin-left:4.8pt;margin-right:0in'> Cheers. -- Felipe Contreras<o:p></o:p> <div> <div> ------------------------------------------------------------------------------ Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3. Spend less time writing and  rewriting code and more time creating great experiences on the web. Be a part of the beta today. <a href="http://p.sf.net/sfu/beautyoftheweb" target="_blank">http://p.sf.net/sfu/beautyoftheweb</a> _______________________________________________ gstreamer-devel mailing list <a href="mailto:gstreamer-devel@lists.sourceforge.net">gstreamer-devel@lists.sourceforge.net</a> <a href="https://lists.sourceforge.net/lists/listinfo/gstreamer-devel" target="_blank">https://lists.sourceforge.net/lists/listinfo/gstreamer-devel</a><o:p></o:p> </div> </div> </blockquote> </div> <o:p> </o:p> </div> </body> </html>