[Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

Siavash Eliasi siavashserver at gmail.com
Tue Nov 4 09:35:34 PST 2014

Hello. I'd get rid of "_mm_set1_ps" inside "_mesa_clamp_float_rgba" by 
passing _m128 version of min/max directly, so "_mm_set1_ps" will be 
moved out of the for loop.

I'd also unroll the "_mesa_streaming_clamp_float_rgba" loop to minimize 
the loop overhead (and utilize out of order execution as a bonus), 
because nothing compute intensive is happening there. You can also use 
prefetching (_mm_prefetch) there to improve performance by reading data 
ahead from memory.

Best regards,
Siavash Eliasi.

More information about the mesa-dev mailing list