[pulseaudio-discuss] "Hot" function optimization recommendations
Justin Chudgar
justin at justinzane.com
Mon Apr 8 12:02:46 PDT 2013
On Thursday, April 04, 2013 04:08:43 PM Justin Chudgar wrote:
> I had experimentally thrown an optimization into my module's only
> significantly warm functions. Since I am a novice, this was a
> just-for-kicks experiment, but I would like to know whether to optimize at
> all beyond the general "-O2", and what platforms are critical to consider
> since I only use pulse on systems that are sufficient to run at "-O0"
> without noticeable problems beyond unnecessary power consumption.
>
> From another thread:
> > I'm not sure what to think about the __attribute__((optimize(3))) usage.
> > Have you done some benchmarking that shows that the speedup is
> > significant compared to the normal -O2? If yes, I guess we can keep
> > them. <tanuk>
>
> I don't know what to think of them either. I did a really simplist benchmark
> with the algorithm on my core i3 laptop initially to determine if it was
> useful to keep everything double or float. There was no benefit to reducing
> presicion on this one system, but that attribute was dramatic. Did not try
> O2, though, just 03 and O0. I thought about messing with vectorization, but
> I only have x86-64 PCs and that seems most valuable for embedded devices
> which I cannot test at the moment.
>
> 11: Determine optimization strategy for filter code.
> http://github.com/justinzane/pulseaudio/issues/issue/11
>
>
> _______________________________________________
> pulseaudio-discuss mailing list
> pulseaudio-discuss at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss
Just some very simplistic benchmark results of
"__attribute__((optimize(#))) function()"
in code similar to a biquad filter:
optimize(0), 1867570825, 27.828974
optimize(1), 1017762024, 15.165836
optimize(2), 951896198, 14.184359
optimize(3), 952574300, 14.194463
This is for "memchunk" analogs of single channel 2^16 doubles being filtered
and averaged over 2^10 runs with forced cpu affinity. The benchmark itself was
compiled with -O0.
With the supporting code compiled -O2, the numbers are:
optimize(0), 1436955156, 21.412300
optimize(1), 1020384309, 15.204911
optimize(2), 952980992, 14.200523
optimize(3), 952473365, 14.192959
Not much difference there.
With the benchmark compiled -O3, there is a DRASTIC change:
optimize(0), 1442046736, 21.488171
optimize(1), 1017924249, 15.168253
optimize(2), 954029138, 14.216142
optimize(3), 374432, 0.005579
That was such a freakish improvement, that I ran it several times, but the
results are quite reliable on my dev system.
Replacing the optimize(#) with hot and using -O3 for the whole gives:
hot, 310780, 0.004631
And removing the __attribute__ altogether, again using -O3 for the whole
gives:
<NONE>, 333013, 0.004962
Being generally a novice using a VERY simplistic wrapper of a rather simple
function, I'm loathe to draw too many conclusions. However, this suggests that
it might be worth using __attribute__(hot) for any serious number crunching
functions within pulse and adopting the -O3 compiler flags as the standard.
If I can figure out oprofile or something similar, I'll try to test. I'd also
like to hear general feedback about this since I'm just learning. Thanks, all.
Justin
More information about the pulseaudio-discuss
mailing list