[Mesa-dev] [PATCH RFC] mesa: add SSE optimisation for glDrawElements
steve at snewbury.org.uk
Thu Oct 23 10:28:51 PDT 2014
On Thu, 2014-10-23 at 09:20 -0700, Matt Turner wrote:
> On Thu, Oct 23, 2014 at 2:13 AM, Timothy Arceri <
> t_arceri at yahoo.com.au> wrote:
> > On Wed, 2014-10-22 at 22:49 -0700, Matt Turner wrote:
> > > On Wed, Oct 22, 2014 at 10:30 PM, Matt Turner <mattst88 at gmail.com
> > > > wrote:
> > > > On Wed, Oct 22, 2014 at 9:02 PM, Timothy Arceri <
> > > > t_arceri at yahoo.com.au> wrote:
> > > > > I almost wasn't going to bother sending this out since it
> > > > > uses SSE4.1
> > > > > and its recommended to use glDrawRangeElements anyway. But
> > > > > since these games
> > > > > are still ofter used for benchmarking I thought I'd see if
> > > > > anyone is
> > > > > interested in this. I only optimised GL_UNSIGNED_INT as that
> > > > > was the
> > > > > only place these games were hitting but I guess it wouldn't
> > > > > hurt
> > > > > to optimse the other cases too.
> > > >
> > > > I think it's kind of neat!
> > > >
> > > > It might also be fun to try to do this with OpenMP. OpenMP 3.1
> > > > (supported since gcc-4.7) supports min/max reduction operators.
> > I've never really looked into OpenMP before, but very cool :)
> > It seems simd support wasn't added until 4.0 (gcc-4.9) so using 3.1
> > would require threading. Probably best just to go with 4.0.
> Oh, that's unfortunate. I didn't notice because I'm using 4.9.1 and
> was too preoccupied with finding out when min/max reductions had been
> > > I think all you'd need to do for that is to add this pragma
> > > immediately before the for loop in vbo_exec_array.c:
> > >
> > > #if _OPENMP > ... (have to figure out the date for OMP 3.1)
> > > #pragma omp simd reduction(max:max_ui) reduction(min:min_ui).
> > > #endif
> > >
> > > and then change the inner loop to use ternary for min/max:
> > >
> > > max_ui = ui_indices[i] > max_ui ? ui_indices[i] : max_ui;
> > > min_ui = ui_indices[i] < min_ui ? ui_indices[i] : min_ui;
> > >
> > > I tested it with a little function and confirmed that it
> > > generates
> > > SSE4.1/AVX2 instructions (and even a bunch of SSE2 instructions
> > > when
> > > 4.1 isn't available!) depending on the -march= value I pass.
> > I assume this means there isn't a way to tell OpenMP to build
> > multiple
> > versions and select the best one at runtime, so distros would
> > always
> > just ship SSE2? Anyway I'm going to give the SSE2 code a run on my
> > (6
> > year old) desktop and see how it performs. I will also compare it
> > to my
> > SSE4.1 code on my laptop maybe it won't be to big of a difference.
> I couldn't find a way. :(
> I suspect the SSE 4.1 path you proposed will be the best solution
> since we can use it with runtime detection. We might also simply try
> using OpenMP in the sse_minmax.c file, since it'll be built with
> -msse4.1 and seeing how the generated code compares.
> While on x86-64 we can at least assume SSE 2, we can't make any
> assumptions on 32-bit, which most games still are.
It doesn't hurt to have a compile-time option/detection though, not
everybody uses generic code on random computers, that is pre-compiled
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 181 bytes
Desc: This is a digitally signed message part
More information about the mesa-dev