[Mesa-dev] [PATCH 6/6] draw: use SoA fetch, not AoS one

sroland at vmware.com sroland at vmware.com
Mon Dec 12 00:12:02 UTC 2016


From: Roland Scheidegger <sroland at vmware.com>

Now that there's some SoA fetch which never falls back, we should usually get
results which are better or at least not worse (something like rgba32f will
stay the same). I suppose though it might be worse in some cases where the
format doesn't require conversion (e.g. rg32f) and goes straight to output -
if llvm was able to see through all shuffles then it might have been able
to do away with the aos->soa->aos transpose entirely which can no longer work
possibly except for 4-channel formats (due to replacing the undef channels
with 0/1 before the second transpose and not the first - llvm will
definitely not be able to figure that out). That might actually be quite
common, but I'm not sure llvm really could optimize it in the first place,
and if it's a problem we should just special case such inputs (though note
that if conversion is needed, it isn't obvious if it's better to skip
the transpose or do the conversion AoS-style).

For cases which get way better, think something like R16_UNORM with 8-wide
vectors: this was 8 sign-extend fetches, 8 cvt, 8 muls, followed by
a couple of shuffles to stitch things together (if it is smart enough,
6 unpacks) and then a (8-wide) transpose (not sure if llvm could even
optimize the shuffles + transpose, since the 16bit values were actually
sign-extended to 128bit before being cast to a float vec, so that would be
another 8 unpacks). Now that is just 8 fetches (directly inserted into
vector, albeit there's one 128bit insert needed), 1 cvt, 1 mul.
---
 src/gallium/auxiliary/draw/draw_llvm.c | 54 +++++++++++++++++++++++++---------
 1 file changed, 40 insertions(+), 14 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c
index 19b75a5..f895b76 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -755,11 +755,9 @@ fetch_vector(struct gallivm_state *gallivm,
              LLVMValueRef *inputs,
              LLVMValueRef indices)
 {
-   LLVMValueRef zero = LLVMConstNull(LLVMInt32TypeInContext(gallivm->context));
    LLVMBuilderRef builder = gallivm->builder;
    struct lp_build_context blduivec;
    LLVMValueRef offset, valid_mask;
-   LLVMValueRef aos_fetch[LP_MAX_VECTOR_WIDTH / 32];
    unsigned i;
 
    lp_build_context_init(&blduivec, gallivm, lp_uint_type(vs_type));
@@ -783,21 +781,49 @@ fetch_vector(struct gallivm_state *gallivm,
    }
 
    /*
-    * Note: we probably really want to use SoA fetch, not AoS one (albeit
-    * for most formats it will amount to the same as this isn't very
-    * optimized). But looks dangerous since it assumes alignment.
+    * Use SoA fetch. This should produce better code usually.
+    * Albeit it's possible there's exceptions (in particular if the fetched
+    * value is going directly to output if it's something like RG32F).
     */
-   for (i = 0; i < vs_type.length; i++) {
-      LLVMValueRef offset1, elem;
-      elem = lp_build_const_int32(gallivm, i);
-      offset1 = LLVMBuildExtractElement(builder, offset, elem, "");
+   if (1) {
+      struct lp_type res_type = vs_type;
+      /* The type handling is annoying here... */
+      if (format_desc->colorspace == UTIL_FORMAT_COLORSPACE_RGB &&
+          format_desc->channel[0].pure_integer) {
+         if (format_desc->channel[0].type == UTIL_FORMAT_TYPE_SIGNED) {
+            res_type = lp_type_int_vec(vs_type.width, vs_type.width * vs_type.length);
+         }
+         else if (format_desc->channel[0].type == UTIL_FORMAT_TYPE_UNSIGNED) {
+            res_type = lp_type_uint_vec(vs_type.width, vs_type.width * vs_type.length);
+         }
+      }
 
-      aos_fetch[i] = lp_build_fetch_rgba_aos(gallivm, format_desc,
-                                             lp_float32_vec4_type(),
-                                             FALSE, map_ptr, offset1,
-                                             zero, zero, NULL);
+      lp_build_fetch_rgba_soa(gallivm, format_desc,
+                              res_type, FALSE, map_ptr, offset,
+                              blduivec.zero, blduivec.zero,
+                              NULL, inputs);
+
+      for (i = 0; i < TGSI_NUM_CHANNELS; i++) {
+         inputs[i] = LLVMBuildBitCast(builder, inputs[i],
+                                      lp_build_vec_type(gallivm, vs_type), "");
+      }
+
+   }
+   else {
+      LLVMValueRef zero = LLVMConstNull(LLVMInt32TypeInContext(gallivm->context));
+      LLVMValueRef aos_fetch[LP_MAX_VECTOR_WIDTH / 32];
+      for (i = 0; i < vs_type.length; i++) {
+         LLVMValueRef offset1, elem;
+         elem = lp_build_const_int32(gallivm, i);
+         offset1 = LLVMBuildExtractElement(builder, offset, elem, "");
+
+         aos_fetch[i] = lp_build_fetch_rgba_aos(gallivm, format_desc,
+                                                lp_float32_vec4_type(),
+                                                FALSE, map_ptr, offset1,
+                                                zero, zero, NULL);
+      }
+      convert_to_soa(gallivm, aos_fetch, inputs, vs_type);
    }
-   convert_to_soa(gallivm, aos_fetch, inputs, vs_type);
 
    for (i = 0; i < TGSI_NUM_CHANNELS; i++) {
       inputs[i] = LLVMBuildBitCast(builder, inputs[i], blduivec.vec_type, "");
-- 
2.7.4



More information about the mesa-dev mailing list