[Mesa-dev] [PATCH V4] mesa: add SSE optimisation for glDrawElements

Bruno Jimenez brunojimen at gmail.com
Mon Oct 27 13:04:13 PDT 2014


[snip]
> >> +
> >> +   if (aligned_count >= 4) {
> >                           ^^
> > 
> > Hi,
> > 
> > I have been thinking and I think that you can change that 4 for an 8. In
> > the case aligned_count == 4 there's no gain in using SSE, as you will
> > have to do a final reduction from 4 elements to 1.
> 
> Either that or the initial check for whether or not to call the SSE
> version could be predicated on size.  Something like
> 
>         if (cpu_has_sse4_1 && count >= 8) {
> 
> The actual threshold may be higher than 8 (and is certainly higher than
> 4, as you point out).  It would take some careful microbenchmarks and
> measurement to find the actual tipping point.  Maybe just use 8 and add
> a FINISHME comment for now?

In fact, with enough 'bad' luck the threshold could be 10: 3 unaligned
at the begining, 4 ready for SSE and 3 at the end. Just as a side
thought... why don't we force the alignment in cases when we potentially
may use SSE and avoid the problem from the begining?

Also, I have been toying a bit and I have written a couple of functions
for finding max values in arrays. They can be easily extended to add a
find_min at the same time. The good part is that I have also found a way
to only use SSE2, paralellizing a branchless algorithm for finding
maximums. Maybe we could ship it for x84_64 at least.

If you want to play with them the code is attached. If you need anything
else, just ask.

- Bruno

> 
> > - Bruno
> > 
> >> +      unsigned max_arr[4] __attribute__ ((aligned (16)));
> >> +      unsigned min_arr[4] __attribute__ ((aligned (16)));
> >> +      unsigned vec_count;
> >> +      __m128i max_ui4 = _mm_setzero_si128();
> >> +      __m128i min_ui4 = _mm_set1_epi32(~0U);
> >> +      __m128i ui_indices4;
> >> +      __m128i *ui_indices_ptr;
> >> +
[snip]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: max_array.c
Type: text/x-csrc
Size: 8580 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20141027/3b70f536/attachment.c>


More information about the mesa-dev mailing list