[Pixman] [PATCH 1/3] Add AVX fetcher for x8r8g8b8

Wed May 25 06:04:14 PDT 2011

Matt Turner <mattst88 at gmail.com> writes:

> Signed-off-by: Matt Turner <mattst88 at gmail.com>

General comments on this patch:

- It doesn't compile for me. I believe this is because AVX_CFLAGS isn't
  AC_SUBSTed in configure.ac.

- Either SunCC and Solaris should be dealt with in full, and tested, or
  only GCC support should be added.

- The cpuid support is missing as you said.

- I think I'd prefer to get the implementation added first as a separate
  patch, before the first fetcher.

> +have_avx_intrinsics=no
> +AC_MSG_CHECKING(whether to use AVX intrinsics)
> +xserver_save_CFLAGS=$CFLAGS
> +CFLAGS="$AVX_CFLAGS $CFLAGS"
> +
> +AC_COMPILE_IFELSE([
> +#if defined(__GNUC__) && (__GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 5))
> +#   if !defined(__amd64__) && !defined(__x86_64__)

This is cutted and pasted from SSE2, but the check is not intended to
find out whether the compiler supports the intrinsics; it is intended to
find out whether the __force_align_arg_pointer__ attribute is supported,
which is the case for GCC 4.2 and later.

See the comment above pixman_image_composite32() in pixman.c for why we
need that attribute. According to Jakub Jelinek, with -mavx GCC still
assumes the stack is 16 byte aligned, and when it needs 32 byte
alignment, it will take care of that itself. 

But there could still be code out there that generates 4 byte aligned
stacks on 32 bit, so I believe the comment applies to AVX as well.

Alternatively, and this may be preferable at this point, for SSE2 and
AVX, we could simply require GCC 4.4 and then compile pixman-sse2.c and
pixman-avx.c with -mstackrealign to get rid of the
__force_align_arg_pointer__ stuff altogether.

> +/* load 8 pixels from a 32-byte boundary aligned address */
> +static force_inline __m256i
> +load_256_aligned (__m256i* src)
> +{
> +    return _mm256_load_si256 (src);
> +}

> +/* save 8 pixels using Write Combining memory on a 32-byte
> + * boundary aligned address
> + */
> +static force_inline void
> +save_256_write_combining (__m256i* dst,
> +                          __m256i  data)
> +{
> +    _mm256_stream_si256 (dst, data);
> +}

> +/* save 8 pixels on a unaligned address */
> +static force_inline void
> +save_256_unaligned (__m256i* dst,
> +                    __m256i  data)
> +{
> +    _mm256_storeu_si256 (dst, data);
> +}

These three functions look like they are not used.

> +/* Work around a code generation bug in Sun Studio 12. */
> +#if defined(__SUNPRO_C) && (__SUNPRO_C >= 0x590)
> +# define create_mask_2x32_256(mask0, mask1)				\
> +    (_mm256_set_epi32 ((mask0), (mask1), (mask0), (mask1), \
> +			(mask0), (mask1), (mask0), (mask1)))
> +#else
> +static force_inline __m256i
> +create_mask_2x32_256 (uint32_t mask0,
> +                      uint32_t mask1)
> +{
> +    return _mm256_set_epi32 (mask0, mask1, mask0, mask1,
> +				mask0, mask1, mask0, mask1);
> +}
> +#endif

Is there any reason to believe that Sun Studio 12 supports AVX, and if
it does, that it has this code generation bug?

> --- a/pixman/pixman-cpu.c
> +++ b/pixman/pixman-cpu.c
> @@ -563,6 +563,7 @@ pixman_have_sse2 (void)
>  
>  #endif
>  
> +
>  #else /* __amd64__ */
>  #ifdef USE_MMX
>  #define pixman_have_mmx() TRUE
> @@ -570,6 +571,24 @@ pixman_have_sse2 (void)
>  #ifdef USE_SSE2
>  #define pixman_have_sse2() TRUE
>  #endif
> +#ifdef USE_AVX
> +static pixman_bool_t
> +pixman_have_avx (void)
> +{
> +    static pixman_bool_t initialized = FALSE;
> +    static pixman_bool_t avx_present = TRUE;
> +/*
> +    if (!initialized)
> +    {
> +	unsigned int features = detect_cpu_features ();
> +	avx_present = (features & (MMX | MMX_EXTENSIONS | SSE | SSE2 | AVX)) == (MMX | MMX_EXTENSIONS | SSE | SSE2 | AVX);
> +	initialized = TRUE;
> +    }
> +*/
> +    return avx_present;
> +}
> +
> +#endif
>  #endif /* __amd64__ */
>  #endif

This obviously doesn't do anything yet, but it also looks like
pixman_have_avx() is only being defined in the __amd64__ branch. We'd
need it for 32 bit too.

Thanks,
Soren