[Mesa-dev] [PATCH RFC] mesa: add SSE optimisation for glDrawElements

Matt Turner mattst88 at gmail.com
Thu Oct 23 09:20:24 PDT 2014

On Thu, Oct 23, 2014 at 2:13 AM, Timothy Arceri <t_arceri at yahoo.com.au> wrote:
> On Wed, 2014-10-22 at 22:49 -0700, Matt Turner wrote:
>> On Wed, Oct 22, 2014 at 10:30 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> > On Wed, Oct 22, 2014 at 9:02 PM, Timothy Arceri <t_arceri at yahoo.com.au> wrote:
>> >> I almost wasn't going to bother sending this out since it uses SSE4.1
>> >> and its recommended to use glDrawRangeElements anyway. But since these games
>> >> are still ofter used for benchmarking I thought I'd see if anyone is
>> >> interested in this. I only optimised GL_UNSIGNED_INT as that was the
>> >> only place these games were hitting but I guess it wouldn't hurt
>> >> to optimse the other cases too.
>> >
>> > I think it's kind of neat!
>> >
>> > It might also be fun to try to do this with OpenMP. OpenMP 3.1
>> > (supported since gcc-4.7) supports min/max reduction operators.
> I've never really looked into OpenMP before, but very cool :)
> It seems simd support wasn't added until 4.0 (gcc-4.9) so using 3.1
> would require threading. Probably best just to go with 4.0.

Oh, that's unfortunate. I didn't notice because I'm using 4.9.1 and
was too preoccupied with finding out when min/max reductions had been

>> I think all you'd need to do for that is to add this pragma
>> immediately before the for loop in vbo_exec_array.c:
>> #if _OPENMP > ... (have to figure out the date for OMP 3.1)
>> #pragma omp simd reduction(max:max_ui) reduction(min:min_ui).
>> #endif
>> and then change the inner loop to use ternary for min/max:
>> max_ui = ui_indices[i] > max_ui ? ui_indices[i] : max_ui;
>> min_ui = ui_indices[i] < min_ui ? ui_indices[i] : min_ui;
>> I tested it with a little function and confirmed that it generates
>> SSE4.1/AVX2 instructions (and even a bunch of SSE2 instructions when
>> 4.1 isn't available!) depending on the -march= value I pass.
> I assume this means there isn't a way to tell OpenMP to build multiple
> versions and select the best one at runtime, so distros would always
> just ship SSE2? Anyway I'm going to give the SSE2 code a run on my (6
> year old) desktop and see how it performs. I will also compare it to my
> SSE4.1 code on my laptop maybe it won't be to big of a difference.

I couldn't find a way. :(

I suspect the SSE 4.1 path you proposed will be the best solution
since we can use it with runtime detection. We might also simply try
using OpenMP in the sse_minmax.c file, since it'll be built with
-msse4.1 and seeing how the generated code compares.

While on x86-64 we can at least assume SSE 2, we can't make any
assumptions on 32-bit, which most games still are.

More information about the mesa-dev mailing list