[PATCH xf86-video-ati] Replace loop with clz to calculate log base 2 on non-x86 platforms in radeon.h
Jochen Rollwagen
joro-2013 at t-online.de
Fri Dec 2 09:31:20 UTC 2016
Am 01.12.2016 um 01:57 schrieb Michel Dänzer:
> On 01/12/16 02:52 AM, Jochen Rollwagen wrote:
>> Am 29.11.2016 um 08:32 schrieb Michel Dänzer:
>>> On 29/11/16 03:18 AM, Jochen Rollwagen wrote:
>>>> This commit replaces the loop for calculating log base 2 for
>>>> non-x86-platforms in radeon.h with a clz (count leading zeroes)-based
>>>> version to simplify the code and, well, eliminate the loop.
>>>> Note: There’s no check for val=0 case, since x86-bsr is undefined for
>>>> that case too, that should be okay.
>>>> ---
>>>> src/radeon.h | 7 +++----
>>>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/src/radeon.h b/src/radeon.h
>>>> index cbc7866..b1a1ce0 100644
>>>> --- a/src/radeon.h
>>>> +++ b/src/radeon.h
>>>> @@ -933,17 +933,16 @@ enum {
>>>> static __inline__ int
>>>> RADEONLog2(int val)
>>>> {
>>>> - int bits;
>>>> #if (defined __i386__ || defined __x86_64__) && (defined __GNUC__)
>>>> + int bits;
>>>> +
>>>> __asm volatile("bsrl %1, %0"
>>>> : "=r" (bits)
>>>> : "c" (val)
>>>> );
>>>> return bits;
>>>> #else
>>>> - for (bits = 0; val != 0; val >>= 1, ++bits)
>>>> - ;
>>>> - return bits - 1;
>>>> + return (31 - __builtin_clz(val));
>>>> #endif
>>>> }
>>> Any reason for not using __builtin_clz on x86 as well? AFAICT both gcc
>>> and clang seem to generate more or less the same code with that as with
>>> the inline assembly.
>>>
>>>
>> I guess not. According to
>> http://stackoverflow.com/questions/9353973/implementation-of-builtin-clz
>> "bsr and clz are related but different.
>>
>> On x86 for clz gcc (-O2) generates:
>>
>> bsrl %edi, %eax
>> xorl $31, %eax
>> ret "
> That's not what I'm seeing. Have you compared the code generated by your
> compiler in both cases?
>
>
Well, I'm on a powerpc platform, the generated code looks like this:
RADEONLog2:
stwu %r1,-16(%r1)
cntlzw %r3,%r3
subfic %r3,%r3,31
addi %r1,%r1,16
blr
Straightforward enough.
But i guess you’re right, the generated code is similar enough to
justify a general approach without platform-specific inline assembler
and leaving the rest to the compiler.
I'll post V2 of the patch.
More information about the amd-gfx
mailing list