[Mesa-dev] IROUND() issue

Fri May 18 11:05:41 PDT 2012

Am 18.05.2012 18:41, schrieb Patrick Baggett:
> 
> 
> On Fri, May 18, 2012 at 11:28 AM, Brian Paul <brianp at vmware.com
> <mailto:brianp at vmware.com>> wrote:
> 
>     On 05/18/2012 10:11 AM, Jose Fonseca wrote:
> 
> 
> 
>         ----- Original Message -----
> 
> 
>             A while back I noticed that the piglit roundmode-pixelstore and
>             roundmode-getinteger tests pass on my 64-bit Fedora system
>             but fail
>             on
>             a 32-bit Ubuntu system.  Both glGetIntegerv() and
>             glPixelStoref()
>              use
>             the IROUND() function to convert floats to ints.
> 
>             The implementation if IROUND() that uses the x86 fistp
>             instruction is
>             protected with:
> 
>             #if defined(USE_X86_ASM)&&  defined(__GNUC__)&&
>              defined(__i386__)
> 
> 
>             but that evaluates to 0 on x86-64 (neither USE_X86_ASM nor
>             __i386__
>             are defined) so we use the C fallback:
> 
>             #define IROUND(f)  ((int) (((f)>= 0.0F) ? ((f) + 0.5F) : ((f) -
>             0.5F)))
> 
>             The C version of IROUND() does what we want for the piglit
>             tests but
>             not the x86 version.  I think the default x86 rounding mode is
>             FE_UPWARD so that explains the failures.
> 
> 
>             So I think I'd like to do the following:
> 
>             1. Enable the x86 fistp-based functions in imports.h for x86-64.
> 
> 
>         It's illegal/inneficient to use x87 on x86-64. We should use the
>         appropriate SSE intrisinsic instead.
> 
> 
> The instruction is "cvtss2si". Even if you use SSE here, you depend on
> the rounding mode in the MXCSR register, which means you'll have to set
> that, because some applications change this mode to use a faster or more
> precise rounding mode. It's the parallel problem that you have with "fistp".
>  
> 
> 
>             2. Rename IROUND() to IROUND_FAST() and define it as float->int
>             conversion by whatever method is fastest.
> 
>             3. Define IROUND() as round to nearest int.  For the x86 fistp
>             implementation this would involve setting/restoring the rounding
>             mode.
> 
> 
> If I recall, it is generally run with some other rounding mode other
> than "truncate" by default, so usually float -> int conversions that
> involve truncation (C cast) require changing the rounding mode /to
> truncation/. This was such a problem that in SSE3 there is "fisttp"
> which is "FP integer store with truncation". I guess though if the
> default rounding mode causes problems, there isn't much that can be done
> but change it each time.

Only if you use the fpu stack. With sse you generally just use cvttss2si
if you want truncation.
But this won't help here since apparently we want round to nearest.
While round to nearest is the default rounding mode apparently someone
set it to something else.
My general perception is "if you want performance never touch mxcsr (or
fpu control word)" but if we have to some cpus seem to handle it quite ok.
I think though external code is much more likely to change the fpu
control word (with fldcw) but not mxcsr (ldmxcsr) used for simd
operations, so relying on round to nearest may work with sse where it
failed with old-style fpu (biggest reason why anyone wanted to change
rounding mode was likely because of the required truncation for
float->int conversion in c which thanks to cvttss2si is no longer an
issue). I'll just note that we rely on mxcsr rounding mode being nearest
in gallivm code, though I don't know if someone ensures this is really true.

Roland