[Mesa-dev] [PATCH] intel/fs: Restrict live intervals to the subset possibly reachable from any definition.

Jason Ekstrand jason at jlekstrand.net
Fri Sep 8 01:30:40 UTC 2017


On Thu, Sep 7, 2017 at 5:50 PM, Francisco Jerez <currojerez at riseup.net>
wrote:

> Currently the liveness analysis pass would extend a live interval up
> to the top of the program when no unconditional and complete
> definition of the variable is found that dominates all of its uses.
>
> This can lead to a serious performance problem in shaders containing
> many partial writes, like scalar arithmetic, FP64 and soon FP16
> operations.  The number of oversize live intervals in such workloads
> can cause the compilation time of the shader to explode because of the
> worse than quadratic behavior of the register allocator and scheduler
> when running out of registers, and it can also cause the running time
> of the shader to explode due to the amount of spilling it leads to,
> which is orders of magnitude slower than GRF memory.
>
> This patch fixes it by computing the intersection of our current live
> intervals with the subset of the program that can possibly be reached
> from any definition of the variable.  Extending the storage allocation
> of the variable beyond that is pretty useless because its value is
> guaranteed to be undefined at a point that cannot be reached from any
> definition.
>
> No significant change in the running time of shader-db (with 5%
> statistical significance).
>
> shader-db results on IVB:
>
>   total cycles in shared programs: 61108780 -> 60932856 (-0.29%)
>   cycles in affected programs: 16335482 -> 16159558 (-1.08%)
>   helped: 5121
>   HURT: 4347
>
>   total spills in shared programs: 1309 -> 1288 (-1.60%)
>   spills in affected programs: 249 -> 228 (-8.43%)
>   helped: 3
>   HURT: 0
>
>   total fills in shared programs: 1652 -> 1597 (-3.33%)
>   fills in affected programs: 262 -> 207 (-20.99%)
>   helped: 4
>   HURT: 0
>
>   LOST:   2
>   GAINED: 209
>
> shader-db results on BDW:
>
>   total cycles in shared programs: 67617262 -> 67361220 (-0.38%)
>   cycles in affected programs: 23397142 -> 23141100 (-1.09%)
>   helped: 8045
>   HURT: 6488
>
>   total spills in shared programs: 1456 -> 1252 (-14.01%)
>   spills in affected programs: 465 -> 261 (-43.87%)
>   helped: 3
>   HURT: 0
>
>   total fills in shared programs: 1720 -> 1465 (-14.83%)
>   fills in affected programs: 471 -> 216 (-54.14%)
>   helped: 4
>   HURT: 0
>
>   LOST:   2
>   GAINED: 162
>
> shader-db results on SKL:
>
>   total cycles in shared programs: 65436248 -> 65245186 (-0.29%)
>   cycles in affected programs: 22560936 -> 22369874 (-0.85%)
>   helped: 8457
>   HURT: 6247
>

I'm a but surprised at how many programs this affected.  For something that
doesn't do any partial writes before a  full write, it should have no
effect whatsoever.  Maybe this is because of constant combining or
something?

FYI, I applied this on top of my subgroups series and the dvec2 broadcast
test went from around 1k fills per shader to 0 and the test run time
(mostly compile time) goes from 130s to 7.5s.  I'll let you immagine how
long the dvec4 test takes to compile. :-)

--Jason


>   total spills in shared programs: 437 -> 437 (0.00%)
>   spills in affected programs: 0 -> 0
>   helped: 0
>   HURT: 0
>
>   total fills in shared programs: 870 -> 854 (-1.84%)
>   fills in affected programs: 16 -> 0
>   helped: 1
>   HURT: 0
>
>   LOST:   0
>   GAINED: 107
> ---
>  src/intel/compiler/brw_fs_live_variables.cpp | 34
> ++++++++++++++++++++++++----
>  src/intel/compiler/brw_fs_live_variables.h   | 12 ++++++++++
>  2 files changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_live_variables.cpp
> b/src/intel/compiler/brw_fs_live_variables.cpp
> index c449672a519..059f076fa51 100644
> --- a/src/intel/compiler/brw_fs_live_variables.cpp
> +++ b/src/intel/compiler/brw_fs_live_variables.cpp
> @@ -83,9 +83,11 @@ fs_live_variables::setup_one_write(struct block_data
> *bd, fs_inst *inst,
>     /* The def[] bitset marks when an initialization in a block completely
>      * screens off previous updates of that variable (VGRF channel).
>      */
> -   if (inst->dst.file == VGRF && !inst->is_partial_write()) {
> -      if (!BITSET_TEST(bd->use, var))
> +   if (inst->dst.file == VGRF) {
> +      if (!inst->is_partial_write() && !BITSET_TEST(bd->use, var))
>           BITSET_SET(bd->def, var);
> +
> +      BITSET_SET(bd->defout, var);
>     }
>  }
>
> @@ -199,6 +201,28 @@ fs_live_variables::compute_live_variables()
>           }
>        }
>     }
> +
> +   /* Propagate defin and defout down the CFG to calculate the union of
> live
> +    * variables potentially defined along any possible control flow path.
> +    */
> +   do {
> +      cont = false;
> +
> +      foreach_block (block, cfg) {
> +         const struct block_data *bd = &block_data[block->num];
> +
> +        foreach_list_typed(bblock_link, child_link, link,
> &block->children) {
> +            struct block_data *child_bd = &block_data[child_link->block-
> >num];
> +
> +           for (int i = 0; i < bitset_words; i++) {
> +               const BITSET_WORD new_def = bd->defout[i] &
> ~child_bd->defin[i];
> +               child_bd->defin[i] |= new_def;
> +               child_bd->defout[i] |= new_def;
> +               cont |= new_def;
> +           }
> +        }
> +      }
> +   } while (cont);
>  }
>
>  /**
> @@ -212,12 +236,12 @@ fs_live_variables::compute_start_end()
>        struct block_data *bd = &block_data[block->num];
>
>        for (int i = 0; i < num_vars; i++) {
> -         if (BITSET_TEST(bd->livein, i)) {
> +         if (BITSET_TEST(bd->livein, i) && BITSET_TEST(bd->defin, i)) {
>              start[i] = MIN2(start[i], block->start_ip);
>              end[i] = MAX2(end[i], block->start_ip);
>           }
>
> -         if (BITSET_TEST(bd->liveout, i)) {
> +         if (BITSET_TEST(bd->liveout, i) && BITSET_TEST(bd->defout, i)) {
>              start[i] = MIN2(start[i], block->end_ip);
>              end[i] = MAX2(end[i], block->end_ip);
>           }
> @@ -260,6 +284,8 @@ fs_live_variables::fs_live_variables(fs_visitor *v,
> const cfg_t *cfg)
>        block_data[i].use = rzalloc_array(mem_ctx, BITSET_WORD,
> bitset_words);
>        block_data[i].livein = rzalloc_array(mem_ctx, BITSET_WORD,
> bitset_words);
>        block_data[i].liveout = rzalloc_array(mem_ctx, BITSET_WORD,
> bitset_words);
> +      block_data[i].defin = rzalloc_array(mem_ctx, BITSET_WORD,
> bitset_words);
> +      block_data[i].defout = rzalloc_array(mem_ctx, BITSET_WORD,
> bitset_words);
>
>        block_data[i].flag_def[0] = 0;
>        block_data[i].flag_use[0] = 0;
> diff --git a/src/intel/compiler/brw_fs_live_variables.h
> b/src/intel/compiler/brw_fs_live_variables.h
> index d2d5898ed1c..9e95e443170 100644
> --- a/src/intel/compiler/brw_fs_live_variables.h
> +++ b/src/intel/compiler/brw_fs_live_variables.h
> @@ -55,6 +55,18 @@ struct block_data {
>     /** Which defs reach the exit point of the block. */
>     BITSET_WORD *liveout;
>
> +   /**
> +    * Variables such that the entry point of the block may be reached
> from any
> +    * of their definitions.
> +    */
> +   BITSET_WORD *defin;
> +
> +   /**
> +    * Variables such that the exit point of the block may be reached from
> any
> +    * of their definitions.
> +    */
> +   BITSET_WORD *defout;
> +
>     BITSET_WORD flag_def[1];
>     BITSET_WORD flag_use[1];
>     BITSET_WORD flag_livein[1];
> --
> 2.14.1
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170907/475ba7ff/attachment.html>


More information about the mesa-dev mailing list