[Mesa-dev] [PATCH] Add an accelerated version of F_TO_I for x86_64

Mon Jul 28 13:29:12 PDT 2014

Am 28.07.2014 20:38, schrieb Jason Ekstrand:
> 
> 
> 
> On Mon, Jul 28, 2014 at 8:34 AM, Roland Scheidegger <sroland at vmware.com
> <mailto:sroland at vmware.com>> wrote:
> 
>     Looking all good, though it got me thinking about these numerous
>     float-to-int rounding functions in mesa in various places a bit more.
>     There is, for instance, a _mesa_round_to_even() function, which claims
>     that the c99 lrintf() and friends functions can't be used because the
>     environment might use a different rounding mode. This is of course true
>     though I wonder why it would matter there and not here for instance,
>     since of course cvtss2si is affected by that as well.
> 
> 
> Yes, it does matter for cvtss2si.  I briefly considered replacing F_TO_I
> with lrint and just trusting in the standard library to do it.  However,
> that turns out to have really bad performance (on my system) if you
> don't enable -ffast-math (which we don't by default).  The reason why
> using cvtss2si is ok is that we only use F_TO_I for things where the GL
> spec allows us to be sloppy on the rounding.  In particular, we use it
> for texture format conversion and for things like swrast where we care
> more about fast than perfect.
>  
> 
>     Also, I'm wondering about the IROUND fallback. Richard Sandiford
>     discovered a problem in llvmpipe (using lp_build_iround()) which was
>     using the same method (that is, (int)(val + 0.5) for unsigned numbers,
>     similar but more complex for signed numbers). This works ok for nearly
>     all numbers (though it is definitely not round to nearest even) except
>     numbers between [-]2^23 and [-]2^24-1, in which case it will always
>     return the next higher even number for odd numbers. So
>     IROUND((float)val) != val even for numbers which can be represented as
>     floats exactly. I'd guess though mesa probably doesn't use IROUND for
>     cases where this would matter (most likely some conversion of z24
>     numbers), worst case there would probably not just be the inaccuracy but
>     if you'd have clamped to z24 max (as float, 2^24-1) then done the IROUND
>     you'd get back 2^24. And FWIW some gallium util math function seem to
>     have this problem as well, though again I don't know if it would matter
>     (the gallium util code will only use them if c99 isn't available). 
> 
>     But anyway I guess that's slightly off-topic...
> 
> 
> That's interesting.  I'll give that some thought, but at that point
> you're right at the boundary of floating-point precision and things get
> interesting.  I'll think about it and if I come up with a better way to
> do it, I'll send a patch.
Yes it's right at the boundary but does not exceed it (there's a reason
z24 unorm is very common and z32 unorm is paper spec only :-)).
FWIW the easiest solution for the gallivm code was to use
largest-float-smaller-than-0.5 for the add in iround though I'm not 100%
sure yet of the consequences this could have (at first sight, this looks
quite ok, the previous rounding when using +0.5f gives round up
semantics for positive tied values (0.5, 1.5, 2.5 would become 1, 2, 3
whereas usually you'd possibly expect round-nearest-even hence 0,2,2)
and round down semantics for negative tied values, whereas with this you
get round-nearest-trunc).
Oh and since you're working on it, it would be also nice if the rounding
macro / function stuff could be shared between gallium/util and mesa.

Roland


>  
> 
> 
>     Roland
> 
> 
>     Am 23.07.2014 05:15, schrieb Jason Ekstrand:
>     > According to a quick micro-benchmark, this new version is 20%
>     faster on my
>     > Haswell laptop.
>     >
>     > v2: Removed the XXX note about x86_64 from the comment
>     >
>     > Signed-off-by: Jason Ekstrand <jason.ekstrand at intel.com
>     <mailto:jason.ekstrand at intel.com>>
>     > ---
>     >  src/mesa/main/imports.h | 5 ++++-
>     >  1 file changed, 4 insertions(+), 1 deletion(-)
>     >
>     > diff --git a/src/mesa/main/imports.h b/src/mesa/main/imports.h
>     > index af780b2..c8ae7f2 100644
>     > --- a/src/mesa/main/imports.h
>     > +++ b/src/mesa/main/imports.h
>     > @@ -277,7 +277,6 @@ static inline int IROUND_POS(float f)
>     >
>     >  /**
>     >   * Convert float to int using a fast method.  The rounding mode
>     may vary.
>     > - * XXX We could use an x86-64/SSE2 version here.
>     >   */
>     >  static inline int F_TO_I(float f)
>     >  {
>     > @@ -285,6 +284,10 @@ static inline int F_TO_I(float f)
>     >     int r;
>     >     __asm__ ("fistpl %0" : "=m" (r) : "t" (f) : "st");
>     >     return r;
>     > +#elif defined(USE_X86_64_ASM) && defined(__GNUC__)
>     > +   int r;
>     > +   __asm__ ("cvtss2si %1, %0" : "=r" (r) : "xm" (f));
>     > +   return r;
>     >  #elif defined(USE_X86_ASM) && defined(_MSC_VER)
>     >     int r;
>     >     _asm {
>     >
> 
>