[Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping
Siavash Eliasi
siavashserver at gmail.com
Tue Nov 4 09:35:34 PST 2014
Hello. I'd get rid of "_mm_set1_ps" inside "_mesa_clamp_float_rgba" by
passing _m128 version of min/max directly, so "_mm_set1_ps" will be
moved out of the for loop.
I'd also unroll the "_mesa_streaming_clamp_float_rgba" loop to minimize
the loop overhead (and utilize out of order execution as a bonus),
because nothing compute intensive is happening there. You can also use
prefetching (_mm_prefetch) there to improve performance by reading data
ahead from memory.
Best regards,
Siavash Eliasi.
More information about the mesa-dev
mailing list