[Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

steve at snewbury.org.uk steve at snewbury.org.uk
Fri Nov 7 07:20:02 PST 2014


On Fri Nov 7 14:09:09 2014 GMT, Siavash Eliasi wrote:
> 
> On 11/07/2014 03:14 PM, Steven Newbury wrote:
> > On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote:
> >> On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi <
> >> siavashserver at gmail.com> wrote:
> >>> Then I do recommend removing the "if (cpu_has_sse4_1)" from this
> >>> patch and similar places, because there is no runtime CPU
> >>> dispatching happening for SSE optimized code paths in action and
> >>> just adds extra overhead (unnecessary branches) to the generated
> >>> code.
> >> No. Sorry, I realize I misread your previous question:
> >>
> >>>> I guess checking for "cpu_has_sse4_1" is unnecessary if it isn't
> >>>> controllable by user at runtime; because "USE_SSE41" is a
> >>>> compile time check and requires the target machine to be SSE 4.1
> >>>> capable already.
> >> USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you
> >> to build the code and then use it only on systems that actually
> >> support it.
> >>
> >> All of this could have been pretty easily answered by a few greps
> >> though...
> > I wonder what difference it would make to have an option to compile
> > out the run-time check code to avoid the additional overhead in cases
> > where the builder *knows* at compile time what the run-time system is?
> > (ie Gentoo)
> I think that's possible. Since "cpu_has_sse4_1" and friends are simply 
> macros, one can set them to "true" or "1" during compile time if it's 
> going to be built for an SSE 4.1 capable target so your smart compiler 
> will totally get rid of the unnecessary runtime check.
> 
> I guess "common_x86_features.h" should be modified to something like this:
> 
> #ifdef __SSE4_1__
> #define cpu_has_sse4_1 1
> #else
> #define cpu_has_sse4_1        (_mesa_x86_cpu_features & X86_FEATURE_SSE4_1)
> #endif
>
Yes, this was what I was thinking.  Then perhaps an option for disabling run-time detection, with the available  cpu features then determined during configuration setting  appropriate defines.

Whether it's worth it I don't know. I can imagine the compiler having an easier job optimizing the code.
-- 


More information about the mesa-dev mailing list