[Mesa-dev] [PATCH] Add an accelerated version of F_TO_I for x86_64

Mon Jul 28 11:38:22 PDT 2014

On Mon, Jul 28, 2014 at 8:34 AM, Roland Scheidegger <sroland at vmware.com>
wrote:

> Looking all good, though it got me thinking about these numerous
> float-to-int rounding functions in mesa in various places a bit more.
> There is, for instance, a _mesa_round_to_even() function, which claims
> that the c99 lrintf() and friends functions can't be used because the
> environment might use a different rounding mode. This is of course true
> though I wonder why it would matter there and not here for instance,
> since of course cvtss2si is affected by that as well.
>

Yes, it does matter for cvtss2si.  I briefly considered replacing F_TO_I
with lrint and just trusting in the standard library to do it.  However,
that turns out to have really bad performance (on my system) if you don't
enable -ffast-math (which we don't by default).  The reason why using
cvtss2si is ok is that we only use F_TO_I for things where the GL spec
allows us to be sloppy on the rounding.  In particular, we use it for
texture format conversion and for things like swrast where we care more
about fast than perfect.

> Also, I'm wondering about the IROUND fallback. Richard Sandiford
> discovered a problem in llvmpipe (using lp_build_iround()) which was
> using the same method (that is, (int)(val + 0.5) for unsigned numbers,
> similar but more complex for signed numbers). This works ok for nearly
> all numbers (though it is definitely not round to nearest even) except
> numbers between [-]2^23 and [-]2^24-1, in which case it will always
> return the next higher even number for odd numbers. So
> IROUND((float)val) != val even for numbers which can be represented as
> floats exactly. I'd guess though mesa probably doesn't use IROUND for
> cases where this would matter (most likely some conversion of z24
> numbers), worst case there would probably not just be the inaccuracy but
> if you'd have clamped to z24 max (as float, 2^24-1) then done the IROUND
> you'd get back 2^24. And FWIW some gallium util math function seem to
> have this problem as well, though again I don't know if it would matter
> (the gallium util code will only use them if c99 isn't available).

But anyway I guess that's slightly off-topic...
>

That's interesting.  I'll give that some thought, but at that point you're
right at the boundary of floating-point precision and things get
interesting.  I'll think about it and if I come up with a better way to do
it, I'll send a patch.

>
> Roland
>
>
> Am 23.07.2014 05:15, schrieb Jason Ekstrand:
> > According to a quick micro-benchmark, this new version is 20% faster on
> my
> > Haswell laptop.
> >
> > v2: Removed the XXX note about x86_64 from the comment
> >
> > Signed-off-by: Jason Ekstrand <jason.ekstrand at intel.com>
> > ---
> >  src/mesa/main/imports.h | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/main/imports.h b/src/mesa/main/imports.h
> > index af780b2..c8ae7f2 100644
> > --- a/src/mesa/main/imports.h
> > +++ b/src/mesa/main/imports.h
> > @@ -277,7 +277,6 @@ static inline int IROUND_POS(float f)
> >
> >  /**
> >   * Convert float to int using a fast method.  The rounding mode may
> vary.
> > - * XXX We could use an x86-64/SSE2 version here.
> >   */
> >  static inline int F_TO_I(float f)
> >  {
> > @@ -285,6 +284,10 @@ static inline int F_TO_I(float f)
> >     int r;
> >     __asm__ ("fistpl %0" : "=m" (r) : "t" (f) : "st");
> >     return r;
> > +#elif defined(USE_X86_64_ASM) && defined(__GNUC__)
> > +   int r;
> > +   __asm__ ("cvtss2si %1, %0" : "=r" (r) : "xm" (f));
> > +   return r;
> >  #elif defined(USE_X86_ASM) && defined(_MSC_VER)
> >     int r;
> >     _asm {
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140728/43138d71/attachment.html>