[Mesa-dev] [PATCH] gallivm: work around slow code generated for interleaving 128bit vectors
sroland at vmware.com
sroland at vmware.com
Tue Jun 4 15:46:30 PDT 2013
From: Roland Scheidegger <sroland at vmware.com>
We use 128bit vector interleave for untwiddling in the blend code (with
256bit vectors). llvm generates terrible code for this for some reason,
so instead of generating a shuffle for 2 128bit vectors use a
extract/insert shuffle instead (it only seems to matter we're not using
128bit wide vectors for the shuffle). This decreases instruction count of
the blend code generated for a rgba8 render target without blending from
169 to 113 with llvm 3.1 and from 136 to 114 in llvm 3.2/3.3, and I got
a ~8% (llvm 3.1) and ~5% (3.2/3.3) performance improvement in gears.
(The generated code is still not terribly good as we could actually avoid
the interleaving completely but llvm can't know this.)
---
src/gallium/auxiliary/gallivm/lp_bld_pack.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
index 14fcd38..f660165 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
@@ -271,6 +271,28 @@ lp_build_interleave2(struct gallivm_state *gallivm,
{
LLVMValueRef shuffle;
+ if (type.length == 2 && type.width == 128 && util_cpu_caps.has_avx) {
+ /*
+ * This is a workaround for llvm code generation deficiency. Strangely
+ * enough, while this needs vinsertf128/vextractf128 instructions (hence
+ * a natural match when using 2x128bit vectors) the "normal" unpack shuffle
+ * generates code ranging from atrocious (llvm 3.1) to terrible (llvm 3.2, 3.3).
+ * So use some different shuffles instead (the exact shuffles don't seem to
+ * matter, as long as not using 128bit wide vectors, works with 8x32 or 4x64).
+ */
+ struct lp_type tmp_type = type;
+ LLVMValueRef srchalf[2], tmpdst;
+ tmp_type.length = 4;
+ tmp_type.width = 64;
+ a = LLVMBuildBitCast(gallivm->builder, a, lp_build_vec_type(gallivm, tmp_type), "");
+ b = LLVMBuildBitCast(gallivm->builder, b, lp_build_vec_type(gallivm, tmp_type), "");
+ srchalf[0] = lp_build_extract_range(gallivm, a, lo_hi * 2, 2);
+ srchalf[1] = lp_build_extract_range(gallivm, b, lo_hi * 2, 2);
+ tmp_type.length = 2;
+ tmpdst = lp_build_concat(gallivm, srchalf, tmp_type, 2);
+ return LLVMBuildBitCast(gallivm->builder, tmpdst, lp_build_vec_type(gallivm, type), "");
+
+ }
shuffle = lp_build_const_unpack_shuffle(gallivm, type.length, lo_hi);
return LLVMBuildShuffleVector(gallivm->builder, a, b, shuffle, "");
--
1.7.9.5
More information about the mesa-dev
mailing list