[Mesa-dev] [PATCH V4] mesa: add SSE optimisation for glDrawElements
Bruno Jimenez
brunojimen at gmail.com
Mon Oct 27 13:04:13 PDT 2014
[snip]
> >> +
> >> + if (aligned_count >= 4) {
> > ^^
> >
> > Hi,
> >
> > I have been thinking and I think that you can change that 4 for an 8. In
> > the case aligned_count == 4 there's no gain in using SSE, as you will
> > have to do a final reduction from 4 elements to 1.
>
> Either that or the initial check for whether or not to call the SSE
> version could be predicated on size. Something like
>
> if (cpu_has_sse4_1 && count >= 8) {
>
> The actual threshold may be higher than 8 (and is certainly higher than
> 4, as you point out). It would take some careful microbenchmarks and
> measurement to find the actual tipping point. Maybe just use 8 and add
> a FINISHME comment for now?
In fact, with enough 'bad' luck the threshold could be 10: 3 unaligned
at the begining, 4 ready for SSE and 3 at the end. Just as a side
thought... why don't we force the alignment in cases when we potentially
may use SSE and avoid the problem from the begining?
Also, I have been toying a bit and I have written a couple of functions
for finding max values in arrays. They can be easily extended to add a
find_min at the same time. The good part is that I have also found a way
to only use SSE2, paralellizing a branchless algorithm for finding
maximums. Maybe we could ship it for x84_64 at least.
If you want to play with them the code is attached. If you need anything
else, just ask.
- Bruno
>
> > - Bruno
> >
> >> + unsigned max_arr[4] __attribute__ ((aligned (16)));
> >> + unsigned min_arr[4] __attribute__ ((aligned (16)));
> >> + unsigned vec_count;
> >> + __m128i max_ui4 = _mm_setzero_si128();
> >> + __m128i min_ui4 = _mm_set1_epi32(~0U);
> >> + __m128i ui_indices4;
> >> + __m128i *ui_indices_ptr;
> >> +
[snip]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: max_array.c
Type: text/x-csrc
Size: 8580 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20141027/3b70f536/attachment.c>
More information about the mesa-dev
mailing list