[Mesa-dev] [PATCH] mesa/st: provide native integers implementation of ir_unop_any

Roland Scheidegger sroland at vmware.com
Thu May 8 13:25:05 PDT 2014


Am 08.05.2014 15:18, schrieb Ilia Mirkin:
> Previously, ir_unop_any was implemented via a dot-product call, which
> uses floating point multiplication and addition. The multiplication was
> completely pointless, and the addition can just as well be done with an
> or. Since we know that the inputs are booleans, they must already be in
> canonical 0/~0 format, and the final SNE can also be avoided.
> 
> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
> ---
> 
> I need to take this through a full piglit run, but the basic tests seem to
> work out as expected. This is the result of a compilation of
> fs-op-eq-mat4-mat4:
> 
> FRAG
> PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
> DCL OUT[0], COLOR
> DCL CONST[0..7]
> DCL TEMP[0..4], LOCAL
> IMM[0] FLT32 {    0.0000,     1.0000,     0.0000,     0.0000}
>   0: MOV TEMP[0].yzw, IMM[0].xxxx
>   1: FSNE TEMP[1], CONST[4], CONST[0]
>   2: OR TEMP[1].x, TEMP[1].xxxx, TEMP[1].yyyy
>   3: OR TEMP[1].y, TEMP[1].zzzz, TEMP[1].wwww
>   4: OR TEMP[1].x, TEMP[1].xxxx, TEMP[1].yyyy
>   5: FSNE TEMP[2], CONST[5], CONST[1]
>   6: OR TEMP[2].x, TEMP[2].xxxx, TEMP[2].yyyy
>   7: OR TEMP[2].y, TEMP[2].zzzz, TEMP[2].wwww
>   8: OR TEMP[2].x, TEMP[2].xxxx, TEMP[2].yyyy
>   9: FSNE TEMP[3], CONST[6], CONST[2]
>  10: OR TEMP[3].x, TEMP[3].xxxx, TEMP[3].yyyy
>  11: OR TEMP[3].y, TEMP[3].zzzz, TEMP[3].wwww
>  12: OR TEMP[3].x, TEMP[3].xxxx, TEMP[3].yyyy
>  13: FSNE TEMP[4], CONST[7], CONST[3]
>  14: OR TEMP[4].x, TEMP[4].xxxx, TEMP[4].yyyy
>  15: OR TEMP[4].y, TEMP[4].zzzz, TEMP[4].wwww
>  16: OR TEMP[4].x, TEMP[4].xxxx, TEMP[4].yyyy
>  17: OR TEMP[1].x, TEMP[1].xxxx, TEMP[4].xxxx   <---
>  18: OR TEMP[1].x, TEMP[1], TEMP[3].xxxx        <---
>  19: OR TEMP[1].x, TEMP[1], TEMP[2].xxxx        <---
>  20: NOT TEMP[1].x, TEMP[1].xxxx
>  21: AND TEMP[0].x, TEMP[1].xxxx, IMM[0].yyyy
>  22: MOV OUT[0], TEMP[0]
>  23: END
> 
> The three instructions with arrows are the result of my new logic. I wonder if
> it's cause for concern that I'm not setting a swizzle mask on the
> src... probably a bit, but it works out here. Is there a "writemask ->
> swizzle" converter somewhere? The old instructions would have been
> 
> DP4 TEMP[1], TEMP[1], TEMP[1]
> SNE TEMP[1], TEMP[1], IMM[0] ( == 0.0)
> 
> Or something along those lines. While 1 instruction less in TGSI, at least
> nv50/nvc0 are scalar and would have had to implement DP4 as
> 
> mul
> mul-add
> mul-add
> mul-add
> 
> versus the much more scalar-friendly OR's (in addition to the final SNE being
> gone).
> 
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 75 ++++++++++++++++++++----------
>  1 file changed, 51 insertions(+), 24 deletions(-)
> 
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index bdee1f4..2afd8fb 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -1671,30 +1671,57 @@ glsl_to_tgsi_visitor::visit(ir_expression *ir)
>     case ir_unop_any: {
>        assert(ir->operands[0]->type->is_vector());
>  
> -      /* After the dot-product, the value will be an integer on the
> -       * range [0,4].  Zero stays zero, and positive values become 1.0.
> -       */
> -      glsl_to_tgsi_instruction *const dp =
> -         emit_dp(ir, result_dst, op[0], op[0],
> -                 ir->operands[0]->type->vector_elements);
> -      if (this->prog->Target == GL_FRAGMENT_PROGRAM_ARB &&
> -          result_dst.type == GLSL_TYPE_FLOAT) {
> -	      /* The clamping to [0,1] can be done for free in the fragment
> -	       * shader with a saturate.
> -	       */
> -	      dp->saturate = true;
> -      } else if (result_dst.type == GLSL_TYPE_FLOAT) {
> -	      /* Negating the result of the dot-product gives values on the range
> -	       * [-4, 0].  Zero stays zero, and negative values become 1.0.  This
> -	       * is achieved using SLT.
> -	       */
> -	      st_src_reg slt_src = result_src;
> -	      slt_src.negate = ~slt_src.negate;
> -	      emit(ir, TGSI_OPCODE_SLT, result_dst, slt_src, st_src_reg_for_float(0.0));
> -      }
> -      else {
> -         /* Use SNE 0 if integers are being used as boolean values. */
> -         emit(ir, TGSI_OPCODE_SNE, result_dst, result_src, st_src_reg_for_int(0));
> +      if (native_integers) {
> +         st_src_reg accum = op[0];
> +         accum.swizzle = SWIZZLE_XXXX;
> +         /* OR all the components together, since they should be either 0 or ~0
> +          */
> +         assert(ir->operands[0]->type->is_boolean());
> +         switch (ir->operands[0]->type->vector_elements) {
> +         case 4:
> +            op[0].swizzle = SWIZZLE_WWWW;
> +            emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]);
> +            accum = st_src_reg(result_dst);
> +            /* fallthrough */
> +         case 3:
> +            op[0].swizzle = SWIZZLE_ZZZZ;
> +            emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]);
> +            accum = st_src_reg(result_dst);
> +            /* fallthrough */
> +         case 2:
> +            op[0].swizzle = SWIZZLE_YYYY;
> +            emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]);
> +            break;
> +         default:
> +            assert(!"Unexpected vector size");
> +            break;
> +         }
> +      } else {
> +         /* After the dot-product, the value will be an integer on the
> +          * range [0,4].  Zero stays zero, and positive values become 1.0.
> +          */
> +         glsl_to_tgsi_instruction *const dp =
> +            emit_dp(ir, result_dst, op[0], op[0],
> +                    ir->operands[0]->type->vector_elements);
> +         if (this->prog->Target == GL_FRAGMENT_PROGRAM_ARB &&
> +             result_dst.type == GLSL_TYPE_FLOAT) {
> +            /* The clamping to [0,1] can be done for free in the fragment
> +             * shader with a saturate.
> +             */
> +            dp->saturate = true;
> +         } else if (result_dst.type == GLSL_TYPE_FLOAT) {
> +            /* Negating the result of the dot-product gives values on the range
> +             * [-4, 0].  Zero stays zero, and negative values become 1.0.  This
> +             * is achieved using SLT.
> +             */
> +            st_src_reg slt_src = result_src;
> +            slt_src.negate = ~slt_src.negate;
> +            emit(ir, TGSI_OPCODE_SLT, result_dst, slt_src, st_src_reg_for_float(0.0));
> +         }
> +         else {
> +            /* Use SNE 0 if integers are being used as boolean values. */
> +            emit(ir, TGSI_OPCODE_SNE, result_dst, result_src, st_src_reg_for_int(0));
> +         }
>        }
>        break;
>     }
> 

Reviewed-by: Roland Scheidegger <sroland at vmware.com>

IIRC I even saw this weirdness when I introduced the
float-compare-return-int boolean opcodes, but was too lazy to do
anything about it. I guess most hardware which could do DOT2/3/4 better
than ORs doesn't support native integers in the first place, though
amd's vliw4/vliw5 designs probably can handle DOT easily. Of course, if
you wanted to help such architectures (but the backend might be able to
do this on its own anyway) you could of course translate this shader
with 5 (non-scalar) ORs in total instead of 15 :-).
In any case I agree it is cleaner to not use dot products (you need to
actually really be careful there, it might work but the boolean int is a
NaN if true which can really throw things off)


More information about the mesa-dev mailing list