[Mesa-dev] [PATCH] mesa/st: provide native integers implementation of ir_unop_any

Thu May 8 06:18:26 PDT 2014

Previously, ir_unop_any was implemented via a dot-product call, which
uses floating point multiplication and addition. The multiplication was
completely pointless, and the addition can just as well be done with an
or. Since we know that the inputs are booleans, they must already be in
canonical 0/~0 format, and the final SNE can also be avoided.

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---

I need to take this through a full piglit run, but the basic tests seem to
work out as expected. This is the result of a compilation of
fs-op-eq-mat4-mat4:

FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL OUT[0], COLOR
DCL CONST[0..7]
DCL TEMP[0..4], LOCAL
IMM[0] FLT32 {    0.0000,     1.0000,     0.0000,     0.0000}
  0: MOV TEMP[0].yzw, IMM[0].xxxx
  1: FSNE TEMP[1], CONST[4], CONST[0]
  2: OR TEMP[1].x, TEMP[1].xxxx, TEMP[1].yyyy
  3: OR TEMP[1].y, TEMP[1].zzzz, TEMP[1].wwww
  4: OR TEMP[1].x, TEMP[1].xxxx, TEMP[1].yyyy
  5: FSNE TEMP[2], CONST[5], CONST[1]
  6: OR TEMP[2].x, TEMP[2].xxxx, TEMP[2].yyyy
  7: OR TEMP[2].y, TEMP[2].zzzz, TEMP[2].wwww
  8: OR TEMP[2].x, TEMP[2].xxxx, TEMP[2].yyyy
  9: FSNE TEMP[3], CONST[6], CONST[2]
 10: OR TEMP[3].x, TEMP[3].xxxx, TEMP[3].yyyy
 11: OR TEMP[3].y, TEMP[3].zzzz, TEMP[3].wwww
 12: OR TEMP[3].x, TEMP[3].xxxx, TEMP[3].yyyy
 13: FSNE TEMP[4], CONST[7], CONST[3]
 14: OR TEMP[4].x, TEMP[4].xxxx, TEMP[4].yyyy
 15: OR TEMP[4].y, TEMP[4].zzzz, TEMP[4].wwww
 16: OR TEMP[4].x, TEMP[4].xxxx, TEMP[4].yyyy
 17: OR TEMP[1].x, TEMP[1].xxxx, TEMP[4].xxxx   <---
 18: OR TEMP[1].x, TEMP[1], TEMP[3].xxxx        <---
 19: OR TEMP[1].x, TEMP[1], TEMP[2].xxxx        <---
 20: NOT TEMP[1].x, TEMP[1].xxxx
 21: AND TEMP[0].x, TEMP[1].xxxx, IMM[0].yyyy
 22: MOV OUT[0], TEMP[0]
 23: END

The three instructions with arrows are the result of my new logic. I wonder if
it's cause for concern that I'm not setting a swizzle mask on the
src... probably a bit, but it works out here. Is there a "writemask ->
swizzle" converter somewhere? The old instructions would have been

DP4 TEMP[1], TEMP[1], TEMP[1]
SNE TEMP[1], TEMP[1], IMM[0] ( == 0.0)

Or something along those lines. While 1 instruction less in TGSI, at least
nv50/nvc0 are scalar and would have had to implement DP4 as

mul
mul-add
mul-add
mul-add

versus the much more scalar-friendly OR's (in addition to the final SNE being
gone).

 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 75 ++++++++++++++++++++----------
 1 file changed, 51 insertions(+), 24 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index bdee1f4..2afd8fb 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -1671,30 +1671,57 @@ glsl_to_tgsi_visitor::visit(ir_expression *ir)
    case ir_unop_any: {
       assert(ir->operands[0]->type->is_vector());
 
-      /* After the dot-product, the value will be an integer on the
-       * range [0,4].  Zero stays zero, and positive values become 1.0.
-       */
-      glsl_to_tgsi_instruction *const dp =
-         emit_dp(ir, result_dst, op[0], op[0],
-                 ir->operands[0]->type->vector_elements);
-      if (this->prog->Target == GL_FRAGMENT_PROGRAM_ARB &&
-          result_dst.type == GLSL_TYPE_FLOAT) {
-	      /* The clamping to [0,1] can be done for free in the fragment
-	       * shader with a saturate.
-	       */
-	      dp->saturate = true;
-      } else if (result_dst.type == GLSL_TYPE_FLOAT) {
-	      /* Negating the result of the dot-product gives values on the range
-	       * [-4, 0].  Zero stays zero, and negative values become 1.0.  This
-	       * is achieved using SLT.
-	       */
-	      st_src_reg slt_src = result_src;
-	      slt_src.negate = ~slt_src.negate;
-	      emit(ir, TGSI_OPCODE_SLT, result_dst, slt_src, st_src_reg_for_float(0.0));
-      }
-      else {
-         /* Use SNE 0 if integers are being used as boolean values. */
-         emit(ir, TGSI_OPCODE_SNE, result_dst, result_src, st_src_reg_for_int(0));
+      if (native_integers) {
+         st_src_reg accum = op[0];
+         accum.swizzle = SWIZZLE_XXXX;
+         /* OR all the components together, since they should be either 0 or ~0
+          */
+         assert(ir->operands[0]->type->is_boolean());
+         switch (ir->operands[0]->type->vector_elements) {
+         case 4:
+            op[0].swizzle = SWIZZLE_WWWW;
+            emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]);
+            accum = st_src_reg(result_dst);
+            /* fallthrough */
+         case 3:
+            op[0].swizzle = SWIZZLE_ZZZZ;
+            emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]);
+            accum = st_src_reg(result_dst);
+            /* fallthrough */
+         case 2:
+            op[0].swizzle = SWIZZLE_YYYY;
+            emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]);
+            break;
+         default:
+            assert(!"Unexpected vector size");
+            break;
+         }
+      } else {
+         /* After the dot-product, the value will be an integer on the
+          * range [0,4].  Zero stays zero, and positive values become 1.0.
+          */
+         glsl_to_tgsi_instruction *const dp =
+            emit_dp(ir, result_dst, op[0], op[0],
+                    ir->operands[0]->type->vector_elements);
+         if (this->prog->Target == GL_FRAGMENT_PROGRAM_ARB &&
+             result_dst.type == GLSL_TYPE_FLOAT) {
+            /* The clamping to [0,1] can be done for free in the fragment
+             * shader with a saturate.
+             */
+            dp->saturate = true;
+         } else if (result_dst.type == GLSL_TYPE_FLOAT) {
+            /* Negating the result of the dot-product gives values on the range
+             * [-4, 0].  Zero stays zero, and negative values become 1.0.  This
+             * is achieved using SLT.
+             */
+            st_src_reg slt_src = result_src;
+            slt_src.negate = ~slt_src.negate;
+            emit(ir, TGSI_OPCODE_SLT, result_dst, slt_src, st_src_reg_for_float(0.0));
+         }
+         else {
+            /* Use SNE 0 if integers are being used as boolean values. */
+            emit(ir, TGSI_OPCODE_SNE, result_dst, result_src, st_src_reg_for_int(0));
+         }
       }
       break;
    }
-- 
1.8.3.2