[Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5

Matt Turner mattst88 at gmail.com
Mon Mar 16 10:06:08 PDT 2015


On Wed, Mar 11, 2015 at 1:44 PM, Ian Romanick <idr at freedesktop.org> wrote:
> From: Ian Romanick <ian.d.romanick at intel.com>
>
> On platforms that do not natively generate 0u and ~0u for Boolean
> results, b2f expressions that look like
>
>    f = b2f(expr cmp 0)
>
> will generate better code by pretending the expression is
>
>     f = ir_triop_sel(0.0, 1.0, expr cmp 0)
>
> This is because the last instruction of "expr" can generate the
> condition code for the "cmp 0".  This avoids having to do the "-(b & 1)"
> trick to generate 0u or ~0u for the Boolean result.  This means code like
>
>     mov(16)         g16<1>F         1F
>     mul.ge.f0(16)   null            g6<8,8,1>F      g14<8,8,1>F
>     (+f0) sel(16)   m6<1>F          g16<8,8,1>F     0F
>
> will be generated instead of
>
>     mul(16)         g2<1>F          g12<8,8,1>F     g4<8,8,1>F
>     cmp.ge.f0(16)   g2<1>D          g4<8,8,1>F      0F

Presumably this g4 should be g2?

>     and(16)         g4<1>D          g2<8,8,1>D      1D
>     and(16)         m6<1>D          -g4<8,8,1>D     0x3f800000UD
>
> v2: When the comparison is either == 0.0 or != 0.0 use the knowledge
> that the true (or false) case already results in zero would allow better
> code generation by possibly avoiding a load-immediate instruction.
>
> v3: Apply the optimization even when neither comparitor is zero.
>
> Shader-db results:
>
> GM45 (0x2A42):
> total instructions in shared programs: 3551002 -> 3550829 (-0.00%)
> instructions in affected programs:     33269 -> 33096 (-0.52%)
> helped:                                121
>
> Iron Lake (0x0046):
> total instructions in shared programs: 4993327 -> 4993146 (-0.00%)
> instructions in affected programs:     34199 -> 34018 (-0.53%)
> helped:                                129
>
> No change on other platforms.
>
> Signed-off-by: Ian Romanick <ian.d.romanick at intel.com>
> Cc: Tapani Palli <tapani.palli at intel.com>
> ---
>  src/mesa/drivers/dri/i965/brw_fs.h           |   2 +
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 101 +++++++++++++++++++++++++--
>  2 files changed, 99 insertions(+), 4 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
> index d9d5858..075e90c 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -307,6 +307,7 @@ public:
>                   const fs_reg &a);
>     void emit_minmax(enum brw_conditional_mod conditionalmod, const fs_reg &dst,
>                      const fs_reg &src0, const fs_reg &src1);
> +   bool try_emit_b2f_of_comparison(ir_expression *ir);
>     bool try_emit_saturate(ir_expression *ir);
>     bool try_emit_line(ir_expression *ir);
>     bool try_emit_mad(ir_expression *ir);
> @@ -317,6 +318,7 @@ public:
>     bool opt_saturate_propagation();
>     bool opt_cmod_propagation();
>     void emit_bool_to_cond_code(ir_rvalue *condition);
> +   void emit_bool_to_cond_code_of_reg(ir_expression *expr, fs_reg op[3]);
>     void emit_if_gen6(ir_if *ir);
>     void emit_unspill(bblock_t *block, fs_inst *inst, fs_reg reg,
>                       uint32_t spill_offset, int count);
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index 3025a9d..3d79796 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -475,6 +475,87 @@ fs_visitor::try_emit_mad(ir_expression *ir)
>     return true;
>  }
>
> +bool
> +fs_visitor::try_emit_b2f_of_comparison(ir_expression *ir)
> +{
> +   /* On platforms that do not natively generate 0u and ~0u for Boolean
> +    * results, b2f expressions that look like
> +    *
> +    *     f = b2f(expr cmp 0)
> +    *
> +    * will generate better code by pretending the expression is
> +    *
> +    *     f = ir_triop_csel(0.0, 1.0, expr cmp 0)
> +    *
> +    * This is because the last instruction of "expr" can generate the
> +    * condition code for the "cmp 0".  This avoids having to do the "-(b & 1)"
> +    * trick to generate 0u or ~0u for the Boolean result.  This means code like
> +    *
> +    *     mov(16)         g16<1>F         1F
> +    *     mul.ge.f0(16)   null            g6<8,8,1>F      g14<8,8,1>F
> +    *     (+f0) sel(16)   m6<1>F          g16<8,8,1>F     0F
> +    *
> +    * will be generated instead of
> +    *
> +    *     mul(16)         g2<1>F          g12<8,8,1>F     g4<8,8,1>F
> +    *     cmp.ge.f0(16)   g2<1>D          g4<8,8,1>F      0F
> +    *     and(16)         g4<1>D          g2<8,8,1>D      1D
> +    *     and(16)         m6<1>D          -g4<8,8,1>D     0x3f800000UD
> +    *
> +    * When the comparison is either == 0.0 or != 0.0 using the knowledge that
> +    * the true (or false) case already results in zero would allow better code
> +    * generation by possibly avoiding a load-immediate instruction.
> +    */
> +   ir_expression *cmp = ir->operands[0]->as_expression();
> +   if (cmp == NULL)
> +      return false;
> +
> +   if (cmp->operation == ir_binop_equal || cmp->operation == ir_binop_nequal) {
> +      for (unsigned i = 0; i < 2; i++) {
> +         ir_constant *c = cmp->operands[i]->as_constant();
> +         if (c == NULL || !c->is_zero())
> +            continue;
> +
> +         ir_expression *expr = cmp->operands[i ^ 1]->as_expression();
> +         if (expr != NULL) {
> +            fs_reg op[2];
> +
> +            for (unsigned j = 0; j < 2; j++) {
> +               cmp->operands[j]->accept(this);
> +               op[j] = this->result;
> +
> +               resolve_ud_negate(&op[j]);
> +            }
> +
> +            emit_bool_to_cond_code_of_reg(cmp, op);
> +
> +            /* In this case we know when the condition is true, op[i ^ 1]
> +             * contains zero.  Invert the predicate, use op[i ^ 1] as src0,
> +             * and immediate 1.0f as src1.
> +             */
> +            this->result = vgrf(ir->type);
> +            op[i ^ 1].type = BRW_REGISTER_TYPE_F;

We just do op[1 - i] in tons of other places. No comment needed to explain 1-i.


More information about the mesa-dev mailing list