[Mesa-dev] [PATCH 1/2] nir: add pass to move load_const

Rob Clark robdclark at gmail.com
Wed Jun 6 23:47:49 UTC 2018


On Wed, Jun 6, 2018 at 6:51 PM, Ian Romanick <idr at freedesktop.org> wrote:
> On 06/06/2018 07:43 AM, Rob Clark wrote:
>> Run this pass late (after opt loop) to move load_const instructions back
>> into the basic blocks which use the result, in cases where a load_const
>> is only consumed in a single block.
>
> If the load_const is used in more than one block, you could use it to
> the block that dominates all of the block that contain a use.  I suspect
> in many cases that will just be the first block, but it's probably worth
> trying.

hmm, good point.. I was mostly going for the super-obvious wins with
now downside cases (ie. low hanging fruit), but I guess the block that
dominates all the uses isn't going to be worse than the first block..

I can play around w/ the common denominator case.. although in at
least some cases I think it might be better to split the load_const.
Idk, the low hanging fruit approach cuts register usage by 1/3rd in
some of these edge case shaders, which by itself seems like a big
enough win, but I guess nir_shader_compiler_options config params is
an option for more aggressive approaches that may or may not be a win
for different drivers.

>
>> This helps reduce register usage in cases where the backend driver
>> cannot lower the load_const to a uniform.
>
> The trade off is that now you might not be able to hide the latency of
> the load.  I tried this on i965, and the results (below) support this
> idea a bit.  If you have a structured backend IR, this seems like a
> thing better done there... like, if you can't register allocate without
> spilling, try moving the load.  But that's really just a scheduling
> heuristic.  Hmm...
>

iirc, i965 had some clever load_const lower passes so all the threads
in a warp could share the same load_const (iirc?  I might not have
that description quite right)..  so I wasn't expecting this to be
useful to i965 (but if it is, then, bonus!).  For me I can *almost*
always lower load_const to uniform, so in the common cases it hasn't
been a problem.  The exceptions are cases where an alu instruction
consume >1 const (but those get optimized away) or a const plus
uniform, or things like ssbo/image store.  (I noticed the register
usage blowing up thing looking at some deqp tests like
dEQP-GLES31.functional.ssbo.layout.2_level_array.shared.vec4 ;-))

If latency of the load is an issue, perhaps a configuration option
about the minimum size of destination use_block, with the rough theory
that if the block is bigger than some threshold the instruction
scheduling could probably hide the latency?

BR,
-R


> total instructions in shared programs: 14398410 -> 14397918 (<.01%)
> instructions in affected programs: 134856 -> 134364 (-0.36%)
> helped: 53
> HURT: 10
> helped stats (abs) min: 1 max: 30 x̄: 9.47 x̃: 7
> helped stats (rel) min: 0.16% max: 2.74% x̄: 0.64% x̃: 0.40%
> HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
> HURT stats (rel)   min: 0.03% max: 0.21% x̄: 0.05% x̃: 0.03%
> 95% mean confidence interval for instructions value: -9.61 -6.01
> 95% mean confidence interval for instructions %-change: -0.68% -0.38%
> Instructions are helped.
>
> total cycles in shared programs: 532949823 -> 532851352 (-0.02%)
> cycles in affected programs: 260693023 -> 260594552 (-0.04%)
> helped: 122
> HURT: 88
> helped stats (abs) min: 1 max: 24933 x̄: 1080.30 x̃: 20
> helped stats (rel) min: <.01% max: 34.94% x̄: 2.20% x̃: 0.32%
> HURT stats (abs)   min: 1 max: 1884 x̄: 378.69 x̃: 29
> HURT stats (rel)   min: <.01% max: 10.23% x̄: 0.64% x̃: 0.09%
> 95% mean confidence interval for cycles value: -943.71 5.89
> 95% mean confidence interval for cycles %-change: -1.71% -0.32%
> Inconclusive result (value mean confidence interval includes 0).
>
> total spills in shared programs: 8044 -> 7975 (-0.86%)
> spills in affected programs: 2391 -> 2322 (-2.89%)
> helped: 42
> HURT: 0
>
> total fills in shared programs: 10950 -> 10884 (-0.60%)
> fills in affected programs: 2999 -> 2933 (-2.20%)
> helped: 42
> HURT: 0
>
> LOST:   4
> GAINED: 7
>
>> Signed-off-by: Rob Clark <robdclark at gmail.com>
>> ---
>>  src/compiler/Makefile.sources          |   1 +
>>  src/compiler/nir/meson.build           |   1 +
>>  src/compiler/nir/nir.h                 |   1 +
>>  src/compiler/nir/nir_move_load_const.c | 131 +++++++++++++++++++++++++
>>  4 files changed, 134 insertions(+)
>>  create mode 100644 src/compiler/nir/nir_move_load_const.c
>>
>> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
>> index 3daa2c51334..cb8a2875a84 100644
>> --- a/src/compiler/Makefile.sources
>> +++ b/src/compiler/Makefile.sources
>> @@ -253,6 +253,7 @@ NIR_FILES = \
>>       nir/nir_lower_wpos_center.c \
>>       nir/nir_lower_wpos_ytransform.c \
>>       nir/nir_metadata.c \
>> +     nir/nir_move_load_const.c \
>>       nir/nir_move_vec_src_uses_to_dest.c \
>>       nir/nir_normalize_cubemap_coords.c \
>>       nir/nir_opt_conditional_discard.c \
>> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
>> index 3fec363691d..373a47fd155 100644
>> --- a/src/compiler/nir/meson.build
>> +++ b/src/compiler/nir/meson.build
>> @@ -144,6 +144,7 @@ files_libnir = files(
>>    'nir_lower_wpos_ytransform.c',
>>    'nir_lower_bit_size.c',
>>    'nir_metadata.c',
>> +  'nir_move_load_const.c',
>>    'nir_move_vec_src_uses_to_dest.c',
>>    'nir_normalize_cubemap_coords.c',
>>    'nir_opt_conditional_discard.c',
>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> index 5a1f79515ad..073ab4e82ea 100644
>> --- a/src/compiler/nir/nir.h
>> +++ b/src/compiler/nir/nir.h
>> @@ -2612,6 +2612,7 @@ bool nir_remove_dead_variables(nir_shader *shader, nir_variable_mode modes);
>>  bool nir_lower_constant_initializers(nir_shader *shader,
>>                                       nir_variable_mode modes);
>>
>> +bool nir_move_load_const(nir_shader *shader);
>>  bool nir_move_vec_src_uses_to_dest(nir_shader *shader);
>>  bool nir_lower_vec_to_movs(nir_shader *shader);
>>  void nir_lower_alpha_test(nir_shader *shader, enum compare_func func,
>> diff --git a/src/compiler/nir/nir_move_load_const.c b/src/compiler/nir/nir_move_load_const.c
>> new file mode 100644
>> index 00000000000..c0750035fbb
>> --- /dev/null
>> +++ b/src/compiler/nir/nir_move_load_const.c
>> @@ -0,0 +1,131 @@
>> +/*
>> + * Copyright © 2018 Red Hat
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the next
>> + * paragraph) shall be included in all copies or substantial portions of the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
>> + * IN THE SOFTWARE.
>> + *
>> + * Authors:
>> + *    Rob Clark (robdclark at gmail.com>
>> + *
>> + */
>> +
>> +#include "nir.h"
>> +
>> +
>> +/*
>> + * A simple pass that moves load_const's into consuming block if
>> + * they are only consumed in a single block, to try to counter-
>> + * act nir's tendency to move all load_const to the top of the
>> + * first block.
>> + */
>> +
>> +/* iterate a ssa def's use's and if all uses are within a single
>> + * block, return it, otherwise return null.
>> + */
>> +static nir_block *
>> +get_single_block(nir_ssa_def *def)
>> +{
>> +   nir_block *b = NULL;
>> +
>> +   /* hmm, probably ignore if-uses: */
>> +   if (!list_empty(&def->if_uses))
>> +      return NULL;
>> +
>> +   nir_foreach_use(use, def) {
>> +      nir_instr *instr = use->parent_instr;
>> +
>> +      /*
>> +       * Kind of an ugly special-case, but phi instructions
>> +       * need to appear first in the block, so by definition
>> +       * we can't move a load_immed into a block where it is
>> +       * consumed by a phi instruction.  We could conceivably
>> +       * move it into a dominator block.
>> +       */
>> +      if (instr->type == nir_instr_type_phi)
>> +         return NULL;
>> +
>> +      nir_block *use_block = instr->block;
>> +
>> +      if (!b) {
>> +         b = use_block;
>> +      } else if (b != use_block) {
>> +         return NULL;
>> +      }
>> +   }
>> +
>> +   return b;
>> +}
>> +
>> +bool
>> +nir_move_load_const(nir_shader *shader)
>> +{
>> +   bool progress = false;
>> +
>> +   nir_foreach_function(function, shader) {
>> +      if (!function->impl)
>> +         continue;
>> +
>> +      nir_foreach_block(block, function->impl) {
>> +         nir_foreach_instr_safe(instr, block) {
>> +            if (instr->type != nir_instr_type_load_const)
>> +               continue;
>> +
>> +            nir_load_const_instr *load =
>> +                  nir_instr_as_load_const(instr);
>> +            nir_block *use_block =
>> +                  get_single_block(&load->def);
>> +
>> +            /* if consumed in more than one block, ignore.  Possibly
>> +             * there should be a threshold, ie. if used in a small #
>> +             * of blocks in a large shader, it could be better to
>> +             * split the load_const.  For now ignore that and focus
>> +             * on the clear wins:
>> +             */
>> +            if (!use_block)
>> +               continue;
>> +
>> +            if (use_block == load->instr.block)
>> +               continue;
>> +
>> +            exec_node_remove(&load->instr.node);
>> +
>> +            /* insert before first non-phi instruction: */
>> +            nir_foreach_instr(instr2, use_block) {
>> +               if (instr2->type == nir_instr_type_phi)
>> +                  continue;
>> +
>> +               exec_node_insert_node_before(&instr2->node,
>> +                                            &load->instr.node);
>> +
>> +               break;
>> +            }
>> +
>> +            load->instr.block = use_block;
>> +
>> +            progress = true;
>> +         }
>> +      }
>> +
>> +      nir_metadata_preserve(function->impl,
>> +                            nir_metadata_block_index | nir_metadata_dominance);
>> +   }
>> +
>> +
>> +   return progress;
>> +}
>>
>


More information about the mesa-dev mailing list