[Mesa-dev] [PATCH] faster util_next_power_of_two() function
sroland at vmware.com
Wed Jun 8 08:25:29 PDT 2011
Looks ok to me.
I think it might actually be ok to omit the version check, we use
__builtin_popcount without it too (and that seems to be gcc 3.4 too, so
presumably gcc 3.4 is a requirement).
So what about __builtin_clz optimized logbase2 :-)
I think that should just be
31 - __builtin_clz(n | 1)
I wonder though if these microptimizations are really worth it...
Am 08.06.2011 13:38, schrieb Benjamin Bellec:
> Le 06/06/2011 23:54, Roland Scheidegger a écrit :
>> Am 06.06.2011 23:18, schrieb Tormod Volden:
>>> On Sun, Jun 5, 2011 at 1:14 AM, Benjamin Bellec wrote:
>>>> So here is a v2 patch with a builtin GCC optimization which is the
>>>> fastest (thx Matt to point me to this solution).
>>> From patch:
>>> + return (1 << (32 - __builtin_clz(x - 1)));
>>> I don't know if the use of gcc guarantees that int will always be 32
>>> bit, otherwise maybe use sizeof(int)*8 instead of 32? Or even
>>> sizeof(int)*CHAR_BIT for good measures. Although probably the robots
>>> have taken over before this becomes necessary :)
>> Hmm I think a lot more things will break if that's not 32bit.
>> There's another problem though, gcc docs say this:
>> — Built-in Function: int __builtin_clz (unsigned int x)
>> Returns the number of leading 0-bits in x, starting at the most
>> significant bit position. If x is 0, the result is undefined.
>> Which means it's now undefined for x == 1 too - not handling x == 0
>> correctly might not be much of a problem in practice, but the same
>> certainly cannot be said for x == 1. So that should probably be
>> +#if defined(PIPE_CC_GCC)
>> + if (x <= 1)
>> + return 1;
>> + else
>> + return (1 << (32 - __builtin_clz(x - 1)));
>> Also I believe this builtin requires gcc 3.4 - not sure though if the
>> rest of the code compiles on older gcc.
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
> Here is the v4 patch.
More information about the mesa-dev