[Mesa-dev] [PATCH 3/5] i965/fs: Handle CMP.nz ... 0 and AND.nz ... 1 similarly in cmod propagation

Matt Turner mattst88 at gmail.com
Wed Mar 11 16:29:53 PDT 2015


On Wed, Mar 11, 2015 at 1:44 PM, Ian Romanick <idr at freedesktop.org> wrote:
> From: Ian Romanick <ian.d.romanick at intel.com>
>
> Espically on platforms that do not natively generate 0u and ~0u for
> Boolean results, we generate a lot of sequences where a CMP is
> followed by an AND with 1.  emit_bool_to_cond_code does this, for
> example.  On ILK, this results in a sequence like:
>
>     add(8)          g3<1>F          g8<8,8,1>F      -g4<0,1,0>F
>     cmp.l.f0(8)     g3<1>D          g3<8,8,1>F      0F
>     and.nz.f0(8)    null            g3<8,8,1>D      1D
>     (+f0) iff(8)    Jump: 6
>
> The AND.nz is obviously redundant.  By propagating the cmod, we can
> instead generate
>
>     add.l.f0(8)     null            g8<8,8,1>F      -g4<0,1,0>F
>     (+f0) iff(8)    Jump: 6
>
> Existing code already handles the propagation from the CMP to the ADD.
>
> Shader-db results:
>
> GM45 (0x2A42):
> total instructions in shared programs: 3550829 -> 3550788 (-0.00%)
> instructions in affected programs:     10028 -> 9987 (-0.41%)
> helped:                                24
>
> Iron Lake (0x0046):
> total instructions in shared programs: 4993146 -> 4993105 (-0.00%)
> instructions in affected programs:     9675 -> 9634 (-0.42%)
> helped:                                24
>
> Ivy Bridge (0x0166):
> total instructions in shared programs: 6291870 -> 6291794 (-0.00%)
> instructions in affected programs:     17914 -> 17838 (-0.42%)
> helped:                                48
>
> Haswell (0x0426):
> total instructions in shared programs: 5779256 -> 5779180 (-0.00%)
> instructions in affected programs:     16694 -> 16618 (-0.46%)
> helped:                                48
>
> Broadwell (0x162E):
> total instructions in shared programs: 6823088 -> 6823014 (-0.00%)
> instructions in affected programs:     15824 -> 15750 (-0.47%)
> helped:                                46
>
> No chage on Sandy Bridge or on any platform when NIR is used.

typo: change

>
> Signed-off-by: Ian Romanick <ian.d.romanick at intel.com>
> Cc: Matt Turner <mattst88 at gmail.com>
> ---
>  .../drivers/dri/i965/brw_fs_cmod_propagation.cpp   | 32 +++++++++++++++++++++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
> index d0ca2f9..6df3d45 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
> @@ -57,7 +57,8 @@ opt_cmod_propagation_local(bblock_t *block)
>     foreach_inst_in_block_reverse_safe(fs_inst, inst, block) {
>        ip--;
>
> -      if ((inst->opcode != BRW_OPCODE_CMP &&
> +      if ((inst->opcode != BRW_OPCODE_AND &&
> +           inst->opcode != BRW_OPCODE_CMP &&
>             inst->opcode != BRW_OPCODE_MOV) ||
>            inst->predicate != BRW_PREDICATE_NONE ||
>            !inst->dst.is_null() ||
> @@ -65,6 +66,19 @@ opt_cmod_propagation_local(bblock_t *block)
>            inst->src[0].abs)
>           continue;
>
> +      /* Only an AND.NZ can be propagated.  Many AND.Z instructions are
> +       * generated (for ir_unop_not in fs_visitor::emit_bool_to_cond_code).
> +       * Propagating those would require inverting the condition on the CMP.
> +       * This changes both the flag value and the register destination of the
> +       * CMP.  That result may be used elsewhere, so we can't change its value
> +       * on a whim.
> +       */
> +      if (inst->opcode == BRW_OPCODE_AND &&
> +          !(inst->src[1].is_one() &&
> +            inst->conditional_mod == BRW_CONDITIONAL_NZ &&
> +            !inst->src[0].negate))

I had a hard time reading this condition, but I think it is right.

> +         continue;
> +
>        if (inst->opcode == BRW_OPCODE_CMP && !inst->src[1].is_zero())
>           continue;
>
> @@ -80,6 +94,22 @@ opt_cmod_propagation_local(bblock_t *block)
>                  scan_inst->dst.reg_offset != inst->src[0].reg_offset)
>                 break;
>
> +            /* This must be done before the dst.type check because the result
> +             * type of the AND will always be D, but the result of the CMP
> +             * could be anything.  The assumption is that the AND is just
> +             * figuring out what the result of the previous comparison was
> +             * instead of doing a new comparison with a different type.
> +             */
> +            if (inst->opcode == BRW_OPCODE_AND) {
> +               if (scan_inst->opcode == BRW_OPCODE_CMP &&
> +                   scan_inst->writes_flag()) {

CMPs will always writes the flag, so no need to check that. I guess
you could assert it or something if you wanted.

This looks good... we just need some additional unit tests :)

  - Test that you can remove the AND.NZ after a CMP null
  - Test that AND.Z isn't removed

I don't think there's any other interesting case to test?


More information about the mesa-dev mailing list