[Mesa-dev] [PATCH] llvmpipe: add cc clobber to inline asm
Roland Scheidegger
sroland at vmware.com
Mon Aug 20 23:11:48 UTC 2018
Am 20.08.2018 um 23:31 schrieb Grazvydas Ignotas:
> The bsr instruction modifies flags, so that needs to be indicated to the
> compiler. No effect on generated code, but still needed for correctness.
> ---
> src/gallium/drivers/llvmpipe/lp_setup_tri.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/llvmpipe/lp_setup_tri.c b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
> index cec6198ec63..1852ec05d56 100644
> --- a/src/gallium/drivers/llvmpipe/lp_setup_tri.c
> +++ b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
> @@ -732,11 +732,12 @@ floor_pot(uint32_t n)
> if (n == 0)
> return 0;
>
> __asm__("bsr %1,%0"
> : "=r" (n)
> - : "rm" (n));
> + : "rm" (n)
> + : "cc");
> return 1 << n;
> #else
> n |= (n >> 1);
> n |= (n >> 2);
> n |= (n >> 4);
>
Looks alright (although my inline asm is a bit rusty), although I wonder
if maybe floor_pot() should use util_logbase2? Though it's not quite an
exact fit.
Or we could use __builtin_clz directly there based on HAVE___BUILTIN_CLZ.
As a side note, it actually seems tricky to get gcc to emit the
"correct" trivial sequence (tested with version 7.3.1).
If you do
int val = 1 << (31 - __builtin_clz(in));
it emits (-O3)
bsr %eax,%eax
mov $0x1f,%ecx
xor $0x1f,%eax
sub %eax,%ecx
mov $0x1,%eax
shl %cl,%eax
which isn't the end of the world, but it is quite optimization failure.
with -O3 -march=haswell it will figure it out:
bsr %eax,%edx
mov $0x1,%eax
shlx %edx,%eax,%eax
If you think you're clever and instead do
int val = 1 << (__builtin_clz(in) ^ 31);
(which is really the same thing)
gcc now is happy with -O3
bsr %eax,%ecx
mov $0x1,%eax
shl %cl,%eax
Naturally, the sub is gone, and gcc recognized the xor 31 on top of its
own xor 31 for the lzcnt emulation cancel each other out.
but with -O3 -march=haswell it's a bit suboptimal now:
mov $0x1,%edx
lzcnt %eax,%eax
xor $0x1f,%eax
shlx %eax,%edx,%eax
So optimization is quite funny here, depending on if the cpu can do
lzcnt or just bsr. Fun stuff...
In any case,
Reviewed-by: Roland Scheidegger <sroland at vmware.com>
More information about the mesa-dev
mailing list