[Mesa-dev] [PATCH V4] mesa: add SSE optimisation for glDrawElements

Timothy Arceri t_arceri at yahoo.com.au
Tue Oct 28 13:49:36 PDT 2014


On Mon, 2014-10-27 at 20:04 +0000, Bruno Jimenez wrote:
> [snip]
> > >> +
> > >> +   if (aligned_count >= 4) {
> > >                           ^^
> > > 
> > > Hi,
> > > 
> > > I have been thinking and I think that you can change that 4 for an 8. In
> > > the case aligned_count == 4 there's no gain in using SSE, as you will
> > > have to do a final reduction from 4 elements to 1.
> > 
> > Either that or the initial check for whether or not to call the SSE
> > version could be predicated on size.  Something like
> > 
> >         if (cpu_has_sse4_1 && count >= 8) {
> > 
> > The actual threshold may be higher than 8 (and is certainly higher than
> > 4, as you point out).  It would take some careful microbenchmarks and
> > measurement to find the actual tipping point.  Maybe just use 8 and add
> > a FINISHME comment for now?
> 
> In fact, with enough 'bad' luck the threshold could be 10: 3 unaligned
> at the begining, 4 ready for SSE and 3 at the end. Just as a side
> thought... why don't we force the alignment in cases when we potentially
> may use SSE and avoid the problem from the begining?
> 
> Also, I have been toying a bit and I have written a couple of functions
> for finding max values in arrays. They can be easily extended to add a
> find_min at the same time. The good part is that I have also found a way
> to only use SSE2, paralellizing a branchless algorithm for finding
> maximums. Maybe we could ship it for x84_64 at least.
> 
> If you want to play with them the code is attached. If you need anything
> else, just ask.

I haven't had a chance to play with this yet, but were you able to test
your SSE2 code against what OpenMP produces? As per Matt's earlier
suggestion.
Maybe it would be possible to always enable OpenMP on x84_64? Then the
code wouldn't have to change at all much to add SSE2 for this. I don't
know enough about OpenMP though to be sure if enabling it would have any
other side effects.
 
[1] http://locklessinc.com/articles/vectorize/


> 
> - Bruno
> 
> > 
> > > - Bruno
> > > 
> > >> +      unsigned max_arr[4] __attribute__ ((aligned (16)));
> > >> +      unsigned min_arr[4] __attribute__ ((aligned (16)));
> > >> +      unsigned vec_count;
> > >> +      __m128i max_ui4 = _mm_setzero_si128();
> > >> +      __m128i min_ui4 = _mm_set1_epi32(~0U);
> > >> +      __m128i ui_indices4;
> > >> +      __m128i *ui_indices_ptr;
> > >> +
> [snip]




More information about the mesa-dev mailing list