[Mesa-dev] [PATCH] Add an accelerated version of F_TO_I for x86_64

Wed Jul 23 19:28:14 PDT 2014

On Wed, Jul 23, 2014 at 12:01 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> According to a quick micro-benchmark, this new version is 20% faster on my
> Haswell laptop.
>
> v2: Removed the XXX note about x86_64 from the comment
> v3: Use an intrinsic instead of an __asm__ block.  This should give us MSVC
>     support for free.
>
> Signed-off-by: Jason Ekstrand <jason.ekstrand at intel.com>
> ---
>  src/mesa/main/imports.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/main/imports.h b/src/mesa/main/imports.h
> index af780b2..6eb84ca 100644
> --- a/src/mesa/main/imports.h
> +++ b/src/mesa/main/imports.h
> @@ -274,10 +274,12 @@ static inline int IROUND_POS(float f)
>     return (int) (f + 0.5F);
>  }
>
> +#if defined(USE_X86_64_ASM)
> +#  include <xmmintrin.h>
> +#endif
>
>  /**
>   * Convert float to int using a fast method.  The rounding mode may vary.
> - * XXX We could use an x86-64/SSE2 version here.
>   */
>  static inline int F_TO_I(float f)
>  {
> @@ -292,6 +294,8 @@ static inline int F_TO_I(float f)
>          fistp r
>         }
>     return r;
> +#elif defined(USE_X86_64_ASM)
> +   return _mm_cvt_ss2si(_mm_load_ss(&f));
>  #else
>     return IROUND(f);
>  #endif
> --
> 2.0.1

Reviewed-by: Matt Turner <mattst88 at gmail.com>

We could probably just do #ifdef __x86_64__ rather than depending on
x86-64 assembly configure stuff. Change it if you want, otherwise I'm
okay with letting people who build with assembly fix it up if they
care.