[Mesa-dev] [PATCH 3/4] i965: Add a pass to predicate short blocks.

Matt Turner mattst88 at gmail.com
Mon Sep 28 17:25:31 PDT 2015


On Mon, Sep 28, 2015 at 3:26 PM, Matt Turner <mattst88 at gmail.com> wrote:
> total instructions in shared programs: 6496326 -> 6492315 (-0.06%)
> instructions in affected programs:     159282 -> 155271 (-2.52%)
> helped:                                411
> ---
>  src/mesa/drivers/dri/i965/Makefile.sources        |   1 +
>  src/mesa/drivers/dri/i965/brw_fs.cpp              |   1 +
>  src/mesa/drivers/dri/i965/brw_predicate_block.cpp | 104 ++++++++++++++++++++++
>  src/mesa/drivers/dri/i965/brw_shader.h            |   6 +-
>  src/mesa/drivers/dri/i965/brw_vec4.cpp            |   1 +
>  5 files changed, 112 insertions(+), 1 deletion(-)
>  create mode 100644 src/mesa/drivers/dri/i965/brw_predicate_block.cpp
>
> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources
> index cc3ecaf..9b1a039 100644
> --- a/src/mesa/drivers/dri/i965/Makefile.sources
> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
> @@ -90,6 +90,7 @@ i965_FILES = \
>         brw_packed_float.c \
>         brw_performance_monitor.c \
>         brw_pipe_control.c \
> +       brw_predicate_block.cpp \
>         brw_primitive_restart.c \
>         brw_program.c \
>         brw_program.h \
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 5ca5c26..7c7cb0d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -4844,6 +4844,7 @@ fs_visitor::optimize()
>        OPT(opt_cmod_propagation);
>        OPT(dead_code_eliminate);
>        OPT(opt_peephole_sel);
> +      OPT(opt_predicate_block, this);
>        OPT(dead_control_flow_eliminate, this);
>        OPT(opt_register_renaming);
>        OPT(opt_redundant_discard_jumps);
> diff --git a/src/mesa/drivers/dri/i965/brw_predicate_block.cpp b/src/mesa/drivers/dri/i965/brw_predicate_block.cpp
> new file mode 100644
> index 0000000..4973172
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/brw_predicate_block.cpp
> @@ -0,0 +1,104 @@
> +/*
> + * Copyright © 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "brw_cfg.h"
> +
> +/** @file brw_predicate_block.cpp
> + *
> + * This file contains the opt_predicate_block() optimization pass that moves a
> + * small block of instructions from inside an IF/ENDIF block to before the IF
> + * instruction by predicating them. For example,
> + *
> + * Before:
> + *
> + *    CMP.f0
> + *    (+f0) IF
> + *    MUL ...
> + *    ADD ...
> + *    ENDIF
> + *
> + * After:
> + *
> + *    CMP.f0
> + *    (+f0) MUL ...
> + *    (+f0) ADD ...
> + *    (+f0) IF
> + *    ENDIF
> + *
> + * dead_control_flow_eliminate() is then able to remove the IF/ENDIF pair and
> + * combine basic blocks.
> + */
> +
> +bool
> +opt_predicate_block(backend_shader *s)
> +{
> +   bool progress = false;
> +
> +   foreach_block_safe(block, s->cfg) {
> +      if (block->num == 0 || block->num == s->cfg->num_blocks - 1)
> +         continue;
> +
> +      if (block->end_ip - block->start_ip > 3)
> +         continue;
> +
> +      bblock_t *if_block = block->prev();
> +      backend_instruction *if_inst = if_block->end();
> +      if (if_inst->opcode != BRW_OPCODE_IF ||
> +          if_inst->conditional_mod != BRW_CONDITIONAL_NONE)
> +         continue;
> +
> +      backend_instruction *endif_inst = block->next()->start();
> +      if (endif_inst->opcode != BRW_OPCODE_ENDIF)
> +         continue;
> +
> +      bool skip = false;
> +
> +      foreach_inst_in_block(backend_instruction, inst, block) {
> +         if (inst->opcode <= BRW_OPCODE_NOP && !inst->is_control_flow()) {

I was looking at shaders and noticed that this doesn't handle math
instructions, so I added that, which gives an additional

total instructions in shared programs: 6491241 -> 6490857 (-0.01%)
instructions in affected programs:     16200 -> 15816 (-2.37%)
helped:                                65

But also

LOST:                                  2

which is, of course, unfortunate because one of them exhibits a pretty
sizable decrease: FS SIMD8: 816 -> 786 (-3.68%)

Ilia also noted on IRC that the NVIDIA proprietary driver predicates
blocks of instructions but leaves the branches in place that jump if
all channels are off. That's interesting, but I think a lot of the
benefit we see from this on i965 is because it allows us to combine
basic blocks so other passes work better.

Moral of the story is, I think it's time to work on the instruction scheduler.


More information about the mesa-dev mailing list