[Mesa-stable] [Mesa-dev] [PATCH] configure.ac: fix test for SSE4.1 assembler support

Oded Gabbay oded.gabbay at gmail.com
Sun Dec 13 13:32:55 PST 2015


On Sun, Dec 13, 2015 at 10:34 PM, Matt Turner <mattst88 at gmail.com> wrote:
> On Sun, Dec 13, 2015 at 5:23 AM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>> On Sun, Dec 13, 2015 at 11:56 AM, Jonathan Gray <jsg at jsg.id.au> wrote:
>>> On Sat, Dec 12, 2015 at 06:41:56PM +0000, Emil Velikov wrote:
>>>> On 10 December 2015 at 08:42, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>>>> > On Wed, Dec 9, 2015 at 8:30 PM, Matt Turner <mattst88 at gmail.com> wrote:
>>>> >> On Tue, Dec 8, 2015 at 9:37 PM, Jonathan Gray <jsg at jsg.id.au> wrote:
>>>> >>> Change the __m128i variables to be volatile so gcc 4.9 won't optimise
>>>> >>> all of them out with -O1 or greater.  The _mm_set1_epi32/pinsrd calls
>>>> >>> still get optimised out but now there is at least one SSE4.1 instruction
>>>> >>> generated via _mm_max_epu32/pmaxud.  When all of the sse4.1 instructions
>>>> >>> got optimised out the configure test would incorrectly pass when the
>>>> >>> compiler supported the intrinsics and the assembler didn't support the
>>>> >>> instructions.
>>>> >>>
>>>> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91806
>>>> >>> Signed-off-by: Jonathan Gray <jsg at jsg.id.au>
>>>> >>> Cc: "11.0 11.1" <mesa-stable at lists.freedesktop.org>
>>>> >>> ---
>>>> >>>  configure.ac | 2 +-
>>>> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>> >>>
>>>> >>> diff --git a/configure.ac b/configure.ac
>>>> >>> index 260934d..1d82e47 100644
>>>> >>> --- a/configure.ac
>>>> >>> +++ b/configure.ac
>>>> >>> @@ -384,7 +384,7 @@ CFLAGS="$SSE41_CFLAGS $CFLAGS"
>>>> >>>  AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
>>>> >>>  #include <smmintrin.h>
>>>> >>>  int main () {
>>>> >>> -    __m128i a = _mm_set1_epi32 (0), b = _mm_set1_epi32 (0), c;
>>>> >>> +    volatile __m128i a = _mm_set1_epi32 (0), b = _mm_set1_epi32 (0), c;
>>>> >>>      c = _mm_max_epu32(a, b);
>>>> >>>      return 0;
>>>> >>
>>>> >> I would have extracted an int from the result of _mm_max_epu32 and
>>>> >> returned that instead of 0.
>>>> >
>>>> > Instead of the volatile I assume ?
>>>> >
>>>> Precisely. If anyone wants to follow on Matt's suggestion we can pick
>>>> that one as well. I'd like to get a patch for the next stable releases
>>>> (next Friday for 11.0.x and just after new year for 11.1.1) so I'll
>>>> take whatever's around :-)
>>>>
>>>> -Emil
>>>
>>> I avoided that as I wasn't sure if there was a case where autoconf
>>> cared about the return code.  If someone wants to create a new diff
>>> feel free, I have limited connectivity till the middle of next week.
>>
>> So I'm not a huge SSE expert, but I tried doing this (remove volatile
>> and return _mm_cvtsi128_si32 of c):
>>
>> ------------------------
>> #include <mmintrin.h>
>> #include <xmmintrin.h>
>> #include <emmintrin.h>
>>
>> int main () {
>>     __m128i a = _mm_set1_epi32 (0), b = _mm_set1_epi32 (0), c;
>>     c = _mm_xor_si128 (a, b);
>>     return _mm_cvtsi128_si32(c);
>> }
>> -------------------------
>>
>> When compiling with "gcc -O1 -msse2", gcc is 4.8.5 (from RHEL 7.2), I got:
>>
>> ---------------------
>> main:
>> .LFB521:
>> .cfi_startproc
>> movl $0, %eax
>> ret
>> .cfi_endproc
>> -------------------
>>
>> So unless I misunderstood matt's suggestion, I think we *have* to use
>> the volatile as it forces the compiler to produce pxor and movdqa
>> assembly commands.
>
> Since all the arguments to the intrinsics are constants, GCC is
> constant-evaluating them.
>
> I expect all you'd need to do is pass some global variables to the
> intrinsics or similar.

ok, so what helped was this:

int param;

int main () {
    __m128i a = _mm_set1_epi32 (param), b = _mm_set1_epi32 (param+1), c;

Notice the (param+1) - if using just (param), the compiler will
optimize it. And it is quite understandable, as xoring a value with
itself gives 0.

        Oded


More information about the mesa-stable mailing list