[Mesa-dev] [PATCH][RFC] mesa/main: Clamp rgba with streamed sse

Fri Oct 31 10:24:20 PDT 2014

On 31/10/14 17:01, Matt Turner wrote:
> On Fri, Oct 31, 2014 at 4:12 AM, Jose Fonseca <jfonseca at vmware.com> wrote:
>> On 31/10/14 10:13, Juha-Pekka Heikkila wrote:
>>>
>>>    defined(__SSE2__) && defined(__GNUC__)
>>
>>
>> Instead of duplicate this expression everywhere lets create a
>> "HAVE_SSE2_INTRIN" define.  Not only this expression is complex, it will
>> become even more when we updated it for MSVC.
>
> Isn't testing __SSE2__ sufficient? Does MSVC not do this?
>
> clang/icc/gcc all implement this and all of the _mm_* intrinsics.
>

No, __SSE2__ is a GCC-only macro.  It's not defined or needed by MSVC 
compilers.  And I strongly suspect that Intel compiler probably only 
defines it for GCC compatibility.

This is because GCC is quite lame IMO: it can't distinguish between 
"enabling SSE intrinsics" (ie, allow including emmintrin.h and use the 
Intel _mm_* instrincis) and emitting SSE2 opcodes own its own accord. 
That is, when you pass -msse2 to GCC, you're also giving carte blache 
for GCC to emit SSE2 opcodes for any C code!  Which makes it _very_ hard 
to have special code paths for SSE1/2/3/4/etc and no SSE.  Since you 
basically need to compile each path in a different C module, passing 
different -msse* flags to each.

Whereas on MSVC, you can #include emmintrin any time, any where, and 
only the code that uses the intrinsics will generate those opcodes.  So 
you can have a awesomeFuncionC(), awesomeFunctionSSE2(), 
awesomeFunctionAVX() all next to each other, and a switch table to jump 
into them.

In other words, on MSVC, instead of

   #if defined(__SSE2__) && defined(__GNUC__)

all you need is

   #if 1

or

   #if defined(_M_IX86) ||  defined(_M_X64)

if you want the code not to cause problems when targetting non-x86 
architectures.

Of course there's some merit in GCC emiting SSE instructions for plain C 
code, but let's face it: virtually all the code that can benefit from 
SIMD is too complex to be auto-vectorized by compilers, and need humans 
writing code with SSE intrincs.  So GCC is effectively tailored to make 
the rare thing easy, at the expense of making the common thing hard...

I believe recent GCC versions have better support for having specialized 
SSE code side-by-side. But from what I remember of it, is all pretty 
non-standard and GCC specific, so still pretty useless for portable code.

Jose