[Pixman] [RFC] AVX codepaths
mattst88 at gmail.com
Sun May 15 20:43:53 PDT 2011
I took the SSE2 code paths and modified them slightly to generate AVX
code. AVX doesn't have integer operations like SSE2, so the range of
256-bit operations is mostly limited to copying pixels.
Following this email are three patches:
1) AVX fetcher for x8r8g8b8 (a8 and r5g6b5 require integer unpack operations)
2) AVX blt, composite_copy_area and associated fast paths
3) AVX fill
`make check` still passes all 18 tests.
The only other bit of low-hanging fruit I see is
composite_src_x888_8888, but perhaps there are more fast paths
possible with AVX.
I played around with cairo-perf-trace on my Sandy Bridge, but wasn't
able to see any improvements, which may be for any number of reasons,
including but not limited to a) I don't know how to use
cairo-perf-trace, b) AVX doesn't yield any noticeable improvements, c)
the code needs to be better optimized.
Still TODO are AVX cpuid detection and to figure out about SunCC support.
More information about the Pixman