[PATCH xf86-video-ati] Replace loop with clz to calculate log base 2 on non-x86 platforms in radeon.h

Jochen Rollwagen joro-2013 at t-online.de
Wed Nov 30 17:52:54 UTC 2016


Am 29.11.2016 um 08:32 schrieb Michel Dänzer:
> On 29/11/16 03:18 AM, Jochen Rollwagen wrote:
>> This commit replaces the loop for calculating log base 2 for
>> non-x86-platforms in radeon.h with a clz (count leading zeroes)-based
>> version to simplify the code and, well, eliminate the loop.
>> Note: There’s no check for val=0 case, since x86-bsr is undefined for
>> that case too, that should be okay.
>> ---
>>   src/radeon.h |    7 +++----
>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/radeon.h b/src/radeon.h
>> index cbc7866..b1a1ce0 100644
>> --- a/src/radeon.h
>> +++ b/src/radeon.h
>> @@ -933,17 +933,16 @@ enum {
>>   static __inline__ int
>>   RADEONLog2(int val)
>>   {
>> -    int bits;
>>   #if (defined __i386__ || defined __x86_64__) && (defined __GNUC__)
>> +    int bits;
>> +
>>       __asm volatile("bsrl    %1, %0"
>>           : "=r" (bits)
>>           : "c" (val)
>>       );
>>       return bits;
>>   #else
>> -    for (bits = 0; val != 0; val >>= 1, ++bits)
>> -        ;
>> -    return bits - 1;
>> +    return (31 - __builtin_clz(val));
>>   #endif
>>   }
> Any reason for not using __builtin_clz on x86 as well? AFAICT both gcc
> and clang seem to generate more or less the same code with that as with
> the inline assembly.
>
>
I guess not. According to 
http://stackoverflow.com/questions/9353973/implementation-of-builtin-clz 
"bsr and clz are related but different.

On x86 for clz gcc (-O2) generates:

|bsrl %edi, %eax xorl $31, %eax ret " |

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20161130/a5796b5b/attachment.html>


More information about the amd-gfx mailing list