[Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

Thu Jun 30 08:53:09 PDT 2011

Am 30.06.2011 16:14, schrieb Adam Jackson:
> On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
>> Ok in fact there's a gcc bug about memcmp:
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
>> In short gcc's memcmp builtin is totally lame and loses to glibc's
>> memcmp (including call overhead, no knowledge about alignment etc.) even
>> when comparing only very few bytes (and loses BIG time for lots of bytes
>> to compare). Oops. Well at least if the strings are the same (I'd guess
>> if the first byte is different it's hard to beat the gcc builtin...).
>> So this is really a gcc bug. The bug is quite old though with no fix in
>> sight apparently so might need to think about some workaround (but just
>> not doing the comparison doesn't look like the right idea, since
>> apparently it would be faster with the comparison if gcc's memcmp got
>> fixed).
> 
> How do things fare if you build with -fno-builtin-memcmp?

This is even faster:
original ipers: 12.1 fps
ajax patch: 15.5 fps
optimized struct compare: 16.8 fps
-fno-builtin-memcmp: 18.1 fps

Looks like we have a winner :-) I guess glibc optimizes the hell out of
it (in contrast to the other results, this affected all memcmp though I
don't know if any others benefited from that on average).
As noted by Keith though the struct we compare is really large (over 4k)
so trimming the size might be a good idea anyway (of course the 4k size
also meant any call overhead and non-optimal code due to glibc not
knowing alignment beforehand and usage of return value is completely
insignificant).
A 50% improvement from disabling a compiler optimization, lol.

Roland