[Liboil] scalar{min,max}imum_f32

Stephane Fillod f8cfe at free.fr
Tue Dec 13 12:56:08 PST 2005


On Mon, Dec 12, 2005 at 11:57:03PM -0800, Eric Anholt wrote:
> In libpcg for one function I wanted to clamp values to a minimum of 0.0.
> So I wrote a scalarminimum/maximum, including SSE intrinsics code.
> Patch attached for review to make sure I'm going about adding a new
> function right, am stylistically sane, and that it's something we want.
> 
> x t-scalarmax-before
> + t-scalarmax-after
> +--------------------------------------------------------------------------+
> |                       x                                     +    +       |
> |                       x      x                              +    +       |
> |x                 +    x    x x       x  *         x         + +  +  +    |
> |               |_____________AM__________||______________A_____M_________||
> +--------------------------------------------------------------------------+
>     N           Min           Max        Median           Avg        Stddev
> x  10          57.2          59.2          58.4         58.33    0.53343749
> +  10          57.9          59.9          59.7         59.45    0.62583278
> Difference at 95.0% confidence
>         1.12 +/- 0.54635
>         1.92011% +/- 0.936653%
>         (Student's t, pooled s = 0.581473)

Where did you get that nice ASCII graph? Is it an oil-test like program?

[...]
> Index: liboil/sse/math_sse.c
> ===================================================================
> RCS file: /cvs/liboil/liboil/liboil/sse/math_sse.c,v
> retrieving revision 1.1
> diff -u -r1.1 math_sse.c
> --- liboil/sse/math_sse.c	13 Dec 2005 06:26:44 -0000	1.1
> +++ liboil/sse/math_sse.c	13 Dec 2005 07:17:46 -0000
> @@ -303,3 +303,68 @@
>  }
>  OIL_DEFINE_IMPL_FULL (scalarmultiply_f32_ns_sse, scalarmultiply_f32_ns, OIL_IMPL_FLAG_SSE);
>  
> +static void
> +scalarminimum_f32_sse (float *dest, float *src1, float *val, int n)

Defining "const float* src1, const float *val " may help some compilers.

> +{
> +  __m128 xmm1;
> +  float valtmp[4];
> +  int i;
> +
> +  valtmp[0] = *val;
> +  valtmp[1] = *val;
> +  valtmp[2] = *val;
> +  valtmp[3] = *val;
> +  xmm1 = _mm_loadu_ps(valtmp);

What about "xmm1 = _mm_load_ps1(val);" instead the valtmp thing?

> +  /* Initial operations to align the destination pointer */
> +  for (i = 0; i < ((long)dest & 15) >> 2; i++) {
> +    *dest++ = *src1 < *val ? *src1 : *val;
> +    src1++;
> +  }
> +  for (; i < n - 3; i += 4) {
> +    __m128 xmm0;
> +    xmm0 = _mm_loadu_ps(src1);
> +    xmm0 = _mm_min_ps(xmm0, xmm1);
> +    _mm_store_ps(dest, xmm0);
> +    dest += 4;
> +    src1 += 4;
> +  }

What about unrolling the main loop one or two times?

You may check whether src1 is also aligned, and with a dual path (outer
if), use _mm_load_ps for src1. src1 and dest may have the same alignment.
No idea if it's worth it though.

> +  for(;i<n;i++){
> +    *dest++ = *src1 < *val ? *src1 : *val;
> +    src1++;
> +  }
> +}
> +OIL_DEFINE_IMPL_FULL (scalarminimum_f32_sse, scalarminimum_f32, OIL_IMPL_FLAG_SSE);
> +
> +static void
> +scalarmaximum_f32_sse (float *dest, float *src1, float *val, int n)
> +{
idem..

-- 
Stephane


More information about the Liboil mailing list