[Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

Siavash Eliasi siavashserver at gmail.com
Fri Nov 7 09:38:05 PST 2014

On 11/07/2014 07:31 PM, Ian Romanick wrote:
> On 11/07/2014 06:09 AM, Siavash Eliasi wrote:
>> On 11/07/2014 03:14 PM, Steven Newbury wrote:
>>> On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote:
>>>> On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi <
>>>> siavashserver at gmail.com> wrote:
>>>>> Then I do recommend removing the "if (cpu_has_sse4_1)" from this
>>>>> patch and similar places, because there is no runtime CPU
>>>>> dispatching happening for SSE optimized code paths in action and
>>>>> just adds extra overhead (unnecessary branches) to the generated
>>>>> code.
>>>> No. Sorry, I realize I misread your previous question:
>>>>>> I guess checking for "cpu_has_sse4_1" is unnecessary if it isn't
>>>>>> controllable by user at runtime; because "USE_SSE41" is a
>>>>>> compile time check and requires the target machine to be SSE 4.1
>>>>>> capable already.
>>>> USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you
>>>> to build the code and then use it only on systems that actually
>>>> support it.
>>>> All of this could have been pretty easily answered by a few greps
>>>> though...
>>> I wonder what difference it would make to have an option to compile
>>> out the run-time check code to avoid the additional overhead in cases
>>> where the builder *knows* at compile time what the run-time system is?
>>> (ie Gentoo)
>> I think that's possible. Since "cpu_has_sse4_1" and friends are simply
>> macros, one can set them to "true" or "1" during compile time if it's
>> going to be built for an SSE 4.1 capable target so your smart compiler
>> will totally get rid of the unnecessary runtime check.
>> I guess "common_x86_features.h" should be modified to something like this:
>> #ifdef __SSE4_1__
>> #define cpu_has_sse4_1 1
>> #else
>> #define cpu_has_sse4_1        (_mesa_x86_cpu_features & X86_FEATURE_SSE4_1)
>> #endif
> I was thinking about doing something similar for cpu_has_xmm and
> cpu_has_xmm2 for x64.  SSE and SSE2 are required parts of that
> instruction set, so they're always there.
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

I can come up with a patch implementing the same for SSE, SSE2, SSE3 and 
SSSE3 if current approach is fine by you.

More information about the mesa-dev mailing list