[Pixman] [RFC] AVX codepaths

Matt Turner mattst88 at gmail.com
Sun May 15 20:43:53 PDT 2011


I took the SSE2 code paths and modified them slightly to generate AVX
code. AVX doesn't have integer operations like SSE2, so the range of
256-bit operations is mostly limited to copying pixels.

Following this email are three patches:
  1) AVX fetcher for x8r8g8b8 (a8 and r5g6b5 require integer unpack operations)
  2) AVX blt, composite_copy_area and associated fast paths
  3) AVX fill

`make check` still passes all 18 tests.

The only other bit of low-hanging fruit I see is
composite_src_x888_8888, but perhaps there are more fast paths
possible with AVX.

I played around with cairo-perf-trace on my Sandy Bridge, but wasn't
able to see any improvements, which may be for any number of reasons,
including but not limited to a) I don't know how to use
cairo-perf-trace, b) AVX doesn't yield any noticeable improvements, c)
the code needs to be better optimized.

Still TODO are AVX cpuid detection and to figure out about SunCC support.

Thoughts? Improvements?


