[Mesa-dev] [PATCH 3/4] i965: Add a pass to predicate short blocks.
Matt Turner
mattst88 at gmail.com
Mon Sep 28 17:25:31 PDT 2015
On Mon, Sep 28, 2015 at 3:26 PM, Matt Turner <mattst88 at gmail.com> wrote:
> total instructions in shared programs: 6496326 -> 6492315 (-0.06%)
> instructions in affected programs: 159282 -> 155271 (-2.52%)
> helped: 411
> ---
> src/mesa/drivers/dri/i965/Makefile.sources | 1 +
> src/mesa/drivers/dri/i965/brw_fs.cpp | 1 +
> src/mesa/drivers/dri/i965/brw_predicate_block.cpp | 104 ++++++++++++++++++++++
> src/mesa/drivers/dri/i965/brw_shader.h | 6 +-
> src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 +
> 5 files changed, 112 insertions(+), 1 deletion(-)
> create mode 100644 src/mesa/drivers/dri/i965/brw_predicate_block.cpp
>
> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources
> index cc3ecaf..9b1a039 100644
> --- a/src/mesa/drivers/dri/i965/Makefile.sources
> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
> @@ -90,6 +90,7 @@ i965_FILES = \
> brw_packed_float.c \
> brw_performance_monitor.c \
> brw_pipe_control.c \
> + brw_predicate_block.cpp \
> brw_primitive_restart.c \
> brw_program.c \
> brw_program.h \
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 5ca5c26..7c7cb0d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -4844,6 +4844,7 @@ fs_visitor::optimize()
> OPT(opt_cmod_propagation);
> OPT(dead_code_eliminate);
> OPT(opt_peephole_sel);
> + OPT(opt_predicate_block, this);
> OPT(dead_control_flow_eliminate, this);
> OPT(opt_register_renaming);
> OPT(opt_redundant_discard_jumps);
> diff --git a/src/mesa/drivers/dri/i965/brw_predicate_block.cpp b/src/mesa/drivers/dri/i965/brw_predicate_block.cpp
> new file mode 100644
> index 0000000..4973172
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/brw_predicate_block.cpp
> @@ -0,0 +1,104 @@
> +/*
> + * Copyright © 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "brw_cfg.h"
> +
> +/** @file brw_predicate_block.cpp
> + *
> + * This file contains the opt_predicate_block() optimization pass that moves a
> + * small block of instructions from inside an IF/ENDIF block to before the IF
> + * instruction by predicating them. For example,
> + *
> + * Before:
> + *
> + * CMP.f0
> + * (+f0) IF
> + * MUL ...
> + * ADD ...
> + * ENDIF
> + *
> + * After:
> + *
> + * CMP.f0
> + * (+f0) MUL ...
> + * (+f0) ADD ...
> + * (+f0) IF
> + * ENDIF
> + *
> + * dead_control_flow_eliminate() is then able to remove the IF/ENDIF pair and
> + * combine basic blocks.
> + */
> +
> +bool
> +opt_predicate_block(backend_shader *s)
> +{
> + bool progress = false;
> +
> + foreach_block_safe(block, s->cfg) {
> + if (block->num == 0 || block->num == s->cfg->num_blocks - 1)
> + continue;
> +
> + if (block->end_ip - block->start_ip > 3)
> + continue;
> +
> + bblock_t *if_block = block->prev();
> + backend_instruction *if_inst = if_block->end();
> + if (if_inst->opcode != BRW_OPCODE_IF ||
> + if_inst->conditional_mod != BRW_CONDITIONAL_NONE)
> + continue;
> +
> + backend_instruction *endif_inst = block->next()->start();
> + if (endif_inst->opcode != BRW_OPCODE_ENDIF)
> + continue;
> +
> + bool skip = false;
> +
> + foreach_inst_in_block(backend_instruction, inst, block) {
> + if (inst->opcode <= BRW_OPCODE_NOP && !inst->is_control_flow()) {
I was looking at shaders and noticed that this doesn't handle math
instructions, so I added that, which gives an additional
total instructions in shared programs: 6491241 -> 6490857 (-0.01%)
instructions in affected programs: 16200 -> 15816 (-2.37%)
helped: 65
But also
LOST: 2
which is, of course, unfortunate because one of them exhibits a pretty
sizable decrease: FS SIMD8: 816 -> 786 (-3.68%)
Ilia also noted on IRC that the NVIDIA proprietary driver predicates
blocks of instructions but leaves the branches in place that jump if
all channels are off. That's interesting, but I think a lot of the
benefit we see from this on i965 is because it allows us to combine
basic blocks so other passes work better.
Moral of the story is, I think it's time to work on the instruction scheduler.
More information about the mesa-dev
mailing list