[Mesa-dev] [PATCH][RFC] mesa/main: Clamp rgba with streamed sse
Matt Turner
mattst88 at gmail.com
Fri Oct 31 10:17:38 PDT 2014
On Fri, Oct 31, 2014 at 3:13 AM, Juha-Pekka Heikkila
<juhapekka.heikkila at gmail.com> wrote:
> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila at gmail.com>
> ---
> src/mesa/main/colormac.h | 20 +++++++++++++++
> src/mesa/main/pixeltransfer.c | 59 ++++++++++++++++++++++++++++++++-----------
> 2 files changed, 64 insertions(+), 15 deletions(-)
>
> diff --git a/src/mesa/main/colormac.h b/src/mesa/main/colormac.h
> index c8adca6..da5e094 100644
> --- a/src/mesa/main/colormac.h
> +++ b/src/mesa/main/colormac.h
> @@ -51,6 +51,26 @@ _mesa_unclamped_float_rgba_to_ubyte(GLubyte dst[4], const GLfloat src[4])
>
>
> /**
> + * Clamp four float values to [min,max]
> + */
> +#if defined(__SSE2__) && defined(__GNUC__)
> +static inline void
> +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min,
> + const float max)
> +{
> + __m128 operand, minval, maxval;
> +
> + operand = _mm_loadu_ps(src);
Surely 128-bit pixels will be 128-bit aligned? I think we can use an
aligned load here.
> + minval = _mm_set1_ps(min);
> + maxval = _mm_set1_ps(max);
> + operand = _mm_max_ps(operand, minval);
> + operand = _mm_min_ps(operand, maxval);
> + _mm_storeu_ps(result, operand);
And an aligned store here.
> +}
> +#endif
> +
> +
> +/**
> * \name Generic color packing macros. All inputs should be GLubytes.
> *
> * \todo We may move these into texstore.h at some point.
> diff --git a/src/mesa/main/pixeltransfer.c b/src/mesa/main/pixeltransfer.c
> index 8bbeeb8..e16eb59 100644
> --- a/src/mesa/main/pixeltransfer.c
> +++ b/src/mesa/main/pixeltransfer.c
> @@ -35,7 +35,7 @@
> #include "pixeltransfer.h"
> #include "imports.h"
> #include "mtypes.h"
> -
> +#include "x86/common_x86_asm.h"
>
> /*
> * Apply scale and bias factors to an array of RGBA pixels.
> @@ -89,16 +89,34 @@ _mesa_map_rgba( const struct gl_context *ctx, GLuint n, GLfloat rgba[][4] )
> const GLfloat *bMap = ctx->PixelMaps.BtoB.Map;
> const GLfloat *aMap = ctx->PixelMaps.AtoA.Map;
> GLuint i;
> - for (i=0;i<n;i++) {
> - GLfloat r = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F);
> - GLfloat g = CLAMP(rgba[i][GCOMP], 0.0F, 1.0F);
> - GLfloat b = CLAMP(rgba[i][BCOMP], 0.0F, 1.0F);
> - GLfloat a = CLAMP(rgba[i][ACOMP], 0.0F, 1.0F);
> - rgba[i][RCOMP] = rMap[F_TO_I(r * rscale)];
> - rgba[i][GCOMP] = gMap[F_TO_I(g * gscale)];
> - rgba[i][BCOMP] = bMap[F_TO_I(b * bscale)];
> - rgba[i][ACOMP] = aMap[F_TO_I(a * ascale)];
> +
> +#if defined(__SSE2__) && defined(__GNUC__)
> + if (cpu_has_xmm2) {
#ifdef __SSE2__ means the compiler is free to use SSE2 instructions
whenever it pleases. That's not what you want here, if you're also
doing runtime checking (cpu_has_xmm2).
The typical way to do this is to put the function containing SSE
instructions in a separate file that is compiled with -msse2. See
streaming-load-memcpy.c for example. gcc has a way to mark specific
functions, but since we have to compile with MSVC...
I think you just want copy-and-paste the SSE4.1 testing code in
configure.ac for SSE2 and then wrap these uses in #ifdef USE_SSE2.
> + for (i=0;i<n;i++) {
> + GLfloat rgba_temp[4];
> + _mesa_clamp_float_rgba(rgba[i], rgba_temp, 0.0F, 1.0F);
> + rgba[i][RCOMP] = rMap[F_TO_I(rgba_temp[RCOMP] * rscale)];
> + rgba[i][GCOMP] = gMap[F_TO_I(rgba_temp[GCOMP] * gscale)];
> + rgba[i][BCOMP] = bMap[F_TO_I(rgba_temp[BCOMP] * bscale)];
> + rgba[i][ACOMP] = aMap[F_TO_I(rgba_temp[ACOMP] * ascale)];
Oh, but we shouldn't be bothering to store the floats back to memory
anyway. We should just do this part with SSE as well.
More information about the mesa-dev
mailing list