[Mesa-dev] [PATCH] mesa/st: provide native integers implementation of ir_unop_any

Thu May 8 13:42:19 PDT 2014

This looks good to me.

Marek

On Thu, May 8, 2014 at 3:18 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> Previously, ir_unop_any was implemented via a dot-product call, which
> uses floating point multiplication and addition. The multiplication was
> completely pointless, and the addition can just as well be done with an
> or. Since we know that the inputs are booleans, they must already be in
> canonical 0/~0 format, and the final SNE can also be avoided.
>
> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
> ---
>
> I need to take this through a full piglit run, but the basic tests seem to
> work out as expected. This is the result of a compilation of
> fs-op-eq-mat4-mat4:
>
> FRAG
> PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
> DCL OUT[0], COLOR
> DCL CONST[0..7]
> DCL TEMP[0..4], LOCAL
> IMM[0] FLT32 {    0.0000,     1.0000,     0.0000,     0.0000}
>   0: MOV TEMP[0].yzw, IMM[0].xxxx
>   1: FSNE TEMP[1], CONST[4], CONST[0]
>   2: OR TEMP[1].x, TEMP[1].xxxx, TEMP[1].yyyy
>   3: OR TEMP[1].y, TEMP[1].zzzz, TEMP[1].wwww
>   4: OR TEMP[1].x, TEMP[1].xxxx, TEMP[1].yyyy
>   5: FSNE TEMP[2], CONST[5], CONST[1]
>   6: OR TEMP[2].x, TEMP[2].xxxx, TEMP[2].yyyy
>   7: OR TEMP[2].y, TEMP[2].zzzz, TEMP[2].wwww
>   8: OR TEMP[2].x, TEMP[2].xxxx, TEMP[2].yyyy
>   9: FSNE TEMP[3], CONST[6], CONST[2]
>  10: OR TEMP[3].x, TEMP[3].xxxx, TEMP[3].yyyy
>  11: OR TEMP[3].y, TEMP[3].zzzz, TEMP[3].wwww
>  12: OR TEMP[3].x, TEMP[3].xxxx, TEMP[3].yyyy
>  13: FSNE TEMP[4], CONST[7], CONST[3]
>  14: OR TEMP[4].x, TEMP[4].xxxx, TEMP[4].yyyy
>  15: OR TEMP[4].y, TEMP[4].zzzz, TEMP[4].wwww
>  16: OR TEMP[4].x, TEMP[4].xxxx, TEMP[4].yyyy
>  17: OR TEMP[1].x, TEMP[1].xxxx, TEMP[4].xxxx   <---
>  18: OR TEMP[1].x, TEMP[1], TEMP[3].xxxx        <---
>  19: OR TEMP[1].x, TEMP[1], TEMP[2].xxxx        <---
>  20: NOT TEMP[1].x, TEMP[1].xxxx
>  21: AND TEMP[0].x, TEMP[1].xxxx, IMM[0].yyyy
>  22: MOV OUT[0], TEMP[0]
>  23: END
>
> The three instructions with arrows are the result of my new logic. I wonder if
> it's cause for concern that I'm not setting a swizzle mask on the
> src... probably a bit, but it works out here. Is there a "writemask ->
> swizzle" converter somewhere? The old instructions would have been
>
> DP4 TEMP[1], TEMP[1], TEMP[1]
> SNE TEMP[1], TEMP[1], IMM[0] ( == 0.0)
>
> Or something along those lines. While 1 instruction less in TGSI, at least
> nv50/nvc0 are scalar and would have had to implement DP4 as
>
> mul
> mul-add
> mul-add
> mul-add
>
> versus the much more scalar-friendly OR's (in addition to the final SNE being
> gone).
>
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 75 ++++++++++++++++++++----------
>  1 file changed, 51 insertions(+), 24 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index bdee1f4..2afd8fb 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -1671,30 +1671,57 @@ glsl_to_tgsi_visitor::visit(ir_expression *ir)
>     case ir_unop_any: {
>        assert(ir->operands[0]->type->is_vector());
>
> -      /* After the dot-product, the value will be an integer on the
> -       * range [0,4].  Zero stays zero, and positive values become 1.0.
> -       */
> -      glsl_to_tgsi_instruction *const dp =
> -         emit_dp(ir, result_dst, op[0], op[0],
> -                 ir->operands[0]->type->vector_elements);
> -      if (this->prog->Target == GL_FRAGMENT_PROGRAM_ARB &&
> -          result_dst.type == GLSL_TYPE_FLOAT) {
> -             /* The clamping to [0,1] can be done for free in the fragment
> -              * shader with a saturate.
> -              */
> -             dp->saturate = true;
> -      } else if (result_dst.type == GLSL_TYPE_FLOAT) {
> -             /* Negating the result of the dot-product gives values on the range
> -              * [-4, 0].  Zero stays zero, and negative values become 1.0.  This
> -              * is achieved using SLT.
> -              */
> -             st_src_reg slt_src = result_src;
> -             slt_src.negate = ~slt_src.negate;
> -             emit(ir, TGSI_OPCODE_SLT, result_dst, slt_src, st_src_reg_for_float(0.0));
> -      }
> -      else {
> -         /* Use SNE 0 if integers are being used as boolean values. */
> -         emit(ir, TGSI_OPCODE_SNE, result_dst, result_src, st_src_reg_for_int(0));
> +      if (native_integers) {
> +         st_src_reg accum = op[0];
> +         accum.swizzle = SWIZZLE_XXXX;
> +         /* OR all the components together, since they should be either 0 or ~0
> +          */
> +         assert(ir->operands[0]->type->is_boolean());
> +         switch (ir->operands[0]->type->vector_elements) {
> +         case 4:
> +            op[0].swizzle = SWIZZLE_WWWW;
> +            emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]);
> +            accum = st_src_reg(result_dst);
> +            /* fallthrough */
> +         case 3:
> +            op[0].swizzle = SWIZZLE_ZZZZ;
> +            emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]);
> +            accum = st_src_reg(result_dst);
> +            /* fallthrough */
> +         case 2:
> +            op[0].swizzle = SWIZZLE_YYYY;
> +            emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]);
> +            break;
> +         default:
> +            assert(!"Unexpected vector size");
> +            break;
> +         }
> +      } else {
> +         /* After the dot-product, the value will be an integer on the
> +          * range [0,4].  Zero stays zero, and positive values become 1.0.
> +          */
> +         glsl_to_tgsi_instruction *const dp =
> +            emit_dp(ir, result_dst, op[0], op[0],
> +                    ir->operands[0]->type->vector_elements);
> +         if (this->prog->Target == GL_FRAGMENT_PROGRAM_ARB &&
> +             result_dst.type == GLSL_TYPE_FLOAT) {
> +            /* The clamping to [0,1] can be done for free in the fragment
> +             * shader with a saturate.
> +             */
> +            dp->saturate = true;
> +         } else if (result_dst.type == GLSL_TYPE_FLOAT) {
> +            /* Negating the result of the dot-product gives values on the range
> +             * [-4, 0].  Zero stays zero, and negative values become 1.0.  This
> +             * is achieved using SLT.
> +             */
> +            st_src_reg slt_src = result_src;
> +            slt_src.negate = ~slt_src.negate;
> +            emit(ir, TGSI_OPCODE_SLT, result_dst, slt_src, st_src_reg_for_float(0.0));
> +         }
> +         else {
> +            /* Use SNE 0 if integers are being used as boolean values. */
> +            emit(ir, TGSI_OPCODE_SNE, result_dst, result_src, st_src_reg_for_int(0));
> +         }
>        }
>        break;
>     }
> --
> 1.8.3.2
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev