[Mesa-dev] IROUND, math errors, etc. (was: Re: proposed patch optimized F_TO_I for powerpc platforms)

Fri Jul 31 07:13:38 PDT 2015

CC mesa-dev.

This looks good to me. I am starting to wonder though why we don't just
use lrintf() and let the compiler sort it out (for x86 too).
Though actually some quick experiments show that:
- llvm's clang will always use libm lrintf call. Which then will do
(x86_64) cvtss2si %xmm0,%rax as expected. Meaning the cost is probably
twice as high as it could be due to the unnecessary library call.
- gcc will also use the same library call. Unless you specify
-fno-math-errno (or some more aggressive math optimizing stuff), in
which case it will do the cvtss2si on its own. Which is fairly stupid,
because this function doesn't set errno in any case, so it could be used
independent of -fno-math-errno.

Speaking of -fno-math-errno, why don't we use that in mesa? I know the
fast math stuff can be problematic, but noone is _ever_ interested in
math error numbers.

Speaking of which, I'm not really sure why IROUND isn't doing the same.
Yes it rounds away from zero, but I doubt that matters - would probably
be better to match whatever rounding is used in hw (GL doesn't seem to
specify tie-breaker rules for round to nearest afaict).

FWIW IROUND along with even the 64bit sibling IROUND64 (and IROUND_POS)
is not even really correct in any case. There exist floats where f +
0.5f will round up to the next integer incorrectly. e.g. something like
"largest float smaller than 63.5f", 63.4999999f or so, if you add +0.5f
the resulting number for the hw is right between that largest float
smaller than 63.5f and 64.0f, and thus it will use the tie-breaker rule
(round to nearest even for your typical hw with typical rounding mode
set) making this 64.0, thus the rounded integer will be 64, which is
just plain wrong no matter the round-to-nearest tie breaker rule.
There are ways to fix it (the obvious one is to add 0.5 as double), but
I don't think we should even try that, and assume lrintf can do a decent
job on hw we care about (compiler not doing its job right is a pity but
might not be too bad even if it uses lib call).

Roland

Am 31.07.2015 um 11:39 schrieb Jochen Rollwagen:
> Hi,
> 
> i've produced and tested the following mesa patch for powerpc platforms
> (based on/inspired by commit 989d2e370993c87d1bbda4950657bfcc5b0a58dd
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__cgit.freedesktop.org_mesa_mesa_commit_-3Fid-3D989d2e370993c87d1bbda4950657bfcc5b0a58dd&d=BQMDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=STUSsCU8K3Bmojesg2aDadO4Yvj0pB5w9ggZb8QtCMA&s=PHNgz2OSHkM8m7A0H0Nf4Y4E917JN4HzwMtSd5qCHFE&e=>
> "Add an accelerated version of F_TO_I for x86_64"):
> 
> diff --git a/src/mesa/main/imports.h b/src/mesa/main/imports.h
> index 09e55eb..e4feb83 100644
> --- a/src/mesa/main/imports.h
> +++ b/src/mesa/main/imports.h
> @@ -296,6 +296,14 @@ static inline int F_TO_I(float f)
>     return r;
>  #elif defined(__x86_64__)
>     return _mm_cvt_ss2si(_mm_load_ss(&f));
> +#elif defined(__GNUC__) && defined(__PPC__)  
> +   long res [2] ;
> +  
> +   __asm__( "fctiw %0,%0\n\t"
> +            "stfd %0,%1\n\t"
> +            : "=f" (f), "=o" (res): );
> +           
> +   return res [1] ;
>  #else
>     return IROUND(f);
>  #endif
> 
> 
> any chance to get this into mesa for the few other powerpc hangouts
> still around ? performance is markedly improved (although i didn't
> really measure it :-) )
> 
>