[pulseaudio-discuss] "Hot" function optimization recommendations

Justin Chudgar justin at justinzane.com
Mon Apr 8 12:02:46 PDT 2013

On Thursday, April 04, 2013 04:08:43 PM Justin Chudgar wrote:
> I had experimentally thrown an optimization into my module's only
> significantly warm functions. Since I am a novice, this was a
> just-for-kicks experiment, but I would like to know whether to optimize at
> all beyond the general "-O2", and what platforms are critical to consider
> since I only use pulse on systems that are sufficient to run at "-O0"
> without noticeable problems beyond unnecessary power consumption.
> From another thread:
> > I'm not sure what to think about the __attribute__((optimize(3))) usage.
> > Have you done some benchmarking that shows that the speedup is
> > significant compared to the normal -O2? If yes, I guess we can keep
> > them. <tanuk>
> I don't know what to think of them either. I did a really simplist benchmark
> with the algorithm on my core i3 laptop initially to determine if it was
> useful to keep everything double or float. There was no benefit to reducing
> presicion on this one system, but that attribute was dramatic. Did not try
> O2, though, just 03 and O0. I thought about messing with vectorization, but
> I only have x86-64 PCs and that seems most valuable for embedded devices
> which I cannot test at the moment.
> 11: Determine optimization strategy for filter code.
> http://github.com/justinzane/pulseaudio/issues/issue/11
> _______________________________________________
> pulseaudio-discuss mailing list
> pulseaudio-discuss at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss

Just some very simplistic benchmark results of  
	"__attribute__((optimize(#))) function()" 
in code similar to a biquad filter:
	optimize(0), 1867570825, 27.828974
	optimize(1), 1017762024, 15.165836
	optimize(2), 951896198, 14.184359
	optimize(3), 952574300, 14.194463
This is for "memchunk" analogs of single channel 2^16 doubles being filtered 
and averaged over 2^10 runs with forced cpu affinity. The benchmark itself was 
compiled with -O0.

With the supporting code compiled -O2, the numbers are:
	optimize(0), 1436955156, 21.412300
	optimize(1), 1020384309, 15.204911
	optimize(2), 952980992, 14.200523
	optimize(3), 952473365, 14.192959
Not much difference there.

With the benchmark compiled -O3, there is a DRASTIC change:
	optimize(0), 1442046736, 21.488171
	optimize(1), 1017924249, 15.168253
	optimize(2), 954029138, 14.216142
	optimize(3), 374432, 0.005579
That was such a freakish improvement, that I ran it several times, but the 
results are quite reliable on my dev system.

Replacing the optimize(#) with hot and using -O3 for the whole gives:
	hot, 310780, 0.004631

And removing the __attribute__ altogether, again using -O3 for the whole 
	<NONE>, 333013, 0.004962

Being generally a novice using a VERY simplistic wrapper of a rather simple 
function, I'm loathe to draw too many conclusions. However, this suggests that 
it might be worth using __attribute__(hot) for any serious number crunching 
functions within pulse and adopting the -O3 compiler flags as the standard.

If I can figure out oprofile or something similar, I'll try to test. I'd also 
like to hear general feedback about this since I'm just learning. Thanks, all.


More information about the pulseaudio-discuss mailing list