Hi, sorry for the late reply, but these times I've actual few hacking hours a day, indeed not during working hours, with great happiness of my family ;) <div class="gmail_quote">On Mon, Oct 11, 2010 at 11:19 AM, Wim Taymans <<a href="mailto:wim.taymans@gmail.com">wim.taymans@gmail.com</a>> wrote: <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">On Sun, 2010-10-10 at 21:40 +0300, Marco Ballesio wrote: ... <div class="im"> > The first positive surprise is that, even after applying the patch, > the following command runs properly on both the architectures: > > time gst-launch --gst-disable-registry-update audiotestsrc > num-buffers=60000 blocksize=128 ! "audio/x-raw-int, rate=8000, > width=16" ! audioconvert ! audioconvert ! audioconvert ! fakesink </div>I'm sure it will go faster but it will cause weird refcounting problems and crashes when you start linking and unlinking pads during dataprocessing, which is not acceptable. </blockquote><div> My patch is not definitely meant as a fix, it's just something to show how things could be optimised in some cases. </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <snip> <div class="im"> > Some months ago I was studying a way to detect thread boundaries from > within a pipeline and then, in case the application "promises" never > to interact with it, to choose an optimised path for gst_path_push > (then I dropped the work because of other tasks). Would it be > interesting to resume such an approach? </div>Yes, it would be very interesting. I've been looking at how to do this for a while now. Unfortunately, most ideas involve making the object flags and the signal counters atomic (which is something I don't think we can do safely for 0.10). Maybe the easiest idea is to make an atomic 1 entry cache in gst_pad_push() that contains the (reffed) peer pad. With the other flag checks and buffer signals atomic, you can avoid taking and releasing two locks and 2 atomic refcounts at the expense of 6 (hopefully more simple) atomic operations. </blockquote><div> mhh, atomic ops have their weight too, but indeed such an optimisation would be better than nothing ;) </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> BTW, before you go down this route, I'm sure you can find a 2% performance increase elsewhere, like making the recursive locks in glib </blockquote><div> please consider the weight is 2% only in this case, where the pipeline is REALLY simple, with 5 elements and 8 pads. A real voip pipeline has much more elements, more than 8 only in the rtpbin. The overhead should be thus proportional to the number of pads (on the ARM about a 20% CPU load increase, mostly in the kernel, has been measured between 60ms and 20ms buffer times). </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> use native pthread recursive locks instead of the 7 method calls that it does now. Or maybe add less contention to the glib type lookups.. Or maybe improve some of the element processing functions. <div class="im"> > > P.S. it appears the penalty decreases when buffers have bigger sizes, > as already shown from Felipe. </div>That's not what that test showed, it showed that the more buffers you push per second, the more CPU you consume, which is rather obvious. </blockquote><div> well, not for me ;). In a perfect world the algorithm should increase in complexity wrt the quantity of data, and not depending on how it's partitioned. The algorithm should be O(1) in this terms, and my patch tends to proof this. That is, the most pads are traversed per second, the most mutexes are locked/unlocked. I'd like somebody to give a shot (if possible) to Felipe's test with my patch,but I'm not sure about the overall stability... Regards, Marco </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> > > Regards > ------------------------------------------------------------------------------ > Beautiful is writing same markup. Internet Explorer 9 supports > standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. > Spend less time writing and rewriting code and more time creating great > experiences on the web. Be a part of the beta today. > <a href="http://p.sf.net/sfu/beautyoftheweb" target="_blank">http://p.sf.net/sfu/beautyoftheweb</a> > _______________________________________________ gstreamer-devel mailing list <a href="mailto:gstreamer-devel@lists.sourceforge.net">gstreamer-devel@lists.sourceforge.net</a> <a href="https://lists.sourceforge.net/lists/listinfo/gstreamer-devel" target="_blank">https://lists.sourceforge.net/lists/listinfo/gstreamer-devel</a> ------------------------------------------------------------------------------ Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. <a href="http://p.sf.net/sfu/beautyoftheweb" target="_blank">http://p.sf.net/sfu/beautyoftheweb</a> _______________________________________________ gstreamer-devel mailing list <a href="mailto:gstreamer-devel@lists.sourceforge.net">gstreamer-devel@lists.sourceforge.net</a> <a href="https://lists.sourceforge.net/lists/listinfo/gstreamer-devel" target="_blank">https://lists.sourceforge.net/lists/listinfo/gstreamer-devel</a> </blockquote></div>