[Mesa-dev] [PATCH] intel/fs: Restrict live intervals to the subset possibly reachable from any definition.

Chema Casanova jmcasanova at igalia.com
Fri Sep 8 13:07:42 UTC 2017


El 08/09/17 a las 11:06, Alejandro Piñeiro escribió:
> On 08/09/17 02:50, Francisco Jerez wrote:
>> Currently the liveness analysis pass would extend a live interval up
>> to the top of the program when no unconditional and complete
>> definition of the variable is found that dominates all of its uses.
>>
>> This can lead to a serious performance problem in shaders containing
>> many partial writes, like scalar arithmetic, FP64 and soon FP16
>> operations.  
> Just tested with the VK_KHR_16bit_storage implementation. Works really
> fine with the most problematic tests, so we can drop the "i965/fs: Add
> reuse_16bit_conversions_register optimization" patch (that was already
> NAKed by both you and Connor).
>
> My test was limited to that extension CTS tests, but in case it helps:
> Tested-by: Alejandro Piñeiro <apinheiro at igalia.com>

I've seen one regression on a full VK-CTS run with this patch over the
VK_KHR_16bit_storage branch, but I can reproduce the same regression
applying it on master.

Failing test is:
dEQP-VK.glsl.return.return_in_dynamic_loop_dynamic_vertex

>> The number of oversize live intervals in such workloads
>> can cause the compilation time of the shader to explode because of the
>> worse than quadratic behavior of the register allocator and scheduler
>> when running out of registers, and it can also cause the running time
>> of the shader to explode due to the amount of spilling it leads to,
>> which is orders of magnitude slower than GRF memory.
>>
>> This patch fixes it by computing the intersection of our current live
>> intervals with the subset of the program that can possibly be reached
>> from any definition of the variable.  Extending the storage allocation
>> of the variable beyond that is pretty useless because its value is
>> guaranteed to be undefined at a point that cannot be reached from any
>> definition.
>>
>> No significant change in the running time of shader-db (with 5%
>> statistical significance).
>>
>> shader-db results on IVB:
>>
>>   total cycles in shared programs: 61108780 -> 60932856 (-0.29%)
>>   cycles in affected programs: 16335482 -> 16159558 (-1.08%)
>>   helped: 5121
>>   HURT: 4347
>>
>>   total spills in shared programs: 1309 -> 1288 (-1.60%)
>>   spills in affected programs: 249 -> 228 (-8.43%)
>>   helped: 3
>>   HURT: 0
>>
>>   total fills in shared programs: 1652 -> 1597 (-3.33%)
>>   fills in affected programs: 262 -> 207 (-20.99%)
>>   helped: 4
>>   HURT: 0
>>
>>   LOST:   2
>>   GAINED: 209
>>
>> shader-db results on BDW:
>>
>>   total cycles in shared programs: 67617262 -> 67361220 (-0.38%)
>>   cycles in affected programs: 23397142 -> 23141100 (-1.09%)
>>   helped: 8045
>>   HURT: 6488
>>
>>   total spills in shared programs: 1456 -> 1252 (-14.01%)
>>   spills in affected programs: 465 -> 261 (-43.87%)
>>   helped: 3
>>   HURT: 0
>>
>>   total fills in shared programs: 1720 -> 1465 (-14.83%)
>>   fills in affected programs: 471 -> 216 (-54.14%)
>>   helped: 4
>>   HURT: 0
>>
>>   LOST:   2
>>   GAINED: 162
>>
>> shader-db results on SKL:
>>
>>   total cycles in shared programs: 65436248 -> 65245186 (-0.29%)
>>   cycles in affected programs: 22560936 -> 22369874 (-0.85%)
>>   helped: 8457
>>   HURT: 6247
>>
>>   total spills in shared programs: 437 -> 437 (0.00%)
>>   spills in affected programs: 0 -> 0
>>   helped: 0
>>   HURT: 0
>>
>>   total fills in shared programs: 870 -> 854 (-1.84%)
>>   fills in affected programs: 16 -> 0
>>   helped: 1
>>   HURT: 0
>>
>>   LOST:   0
>>   GAINED: 107
>> ---
>>  src/intel/compiler/brw_fs_live_variables.cpp | 34 ++++++++++++++++++++++++----
>>  src/intel/compiler/brw_fs_live_variables.h   | 12 ++++++++++
>>  2 files changed, 42 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/intel/compiler/brw_fs_live_variables.cpp b/src/intel/compiler/brw_fs_live_variables.cpp
>> index c449672a519..059f076fa51 100644
>> --- a/src/intel/compiler/brw_fs_live_variables.cpp
>> +++ b/src/intel/compiler/brw_fs_live_variables.cpp
>> @@ -83,9 +83,11 @@ fs_live_variables::setup_one_write(struct block_data *bd, fs_inst *inst,
>>     /* The def[] bitset marks when an initialization in a block completely
>>      * screens off previous updates of that variable (VGRF channel).
>>      */
>> -   if (inst->dst.file == VGRF && !inst->is_partial_write()) {
>> -      if (!BITSET_TEST(bd->use, var))
>> +   if (inst->dst.file == VGRF) {
>> +      if (!inst->is_partial_write() && !BITSET_TEST(bd->use, var))
>>           BITSET_SET(bd->def, var);
>> +
>> +      BITSET_SET(bd->defout, var);
>>     }
>>  }
>>  
>> @@ -199,6 +201,28 @@ fs_live_variables::compute_live_variables()
>>           }
>>        }
>>     }
>> +
>> +   /* Propagate defin and defout down the CFG to calculate the union of live
>> +    * variables potentially defined along any possible control flow path.
>> +    */
>> +   do {
>> +      cont = false;
>> +
>> +      foreach_block (block, cfg) {
>> +         const struct block_data *bd = &block_data[block->num];
>> +
>> +	 foreach_list_typed(bblock_link, child_link, link, &block->children) {
>> +            struct block_data *child_bd = &block_data[child_link->block->num];
>> +
>> +	    for (int i = 0; i < bitset_words; i++) {
>> +               const BITSET_WORD new_def = bd->defout[i] & ~child_bd->defin[i];
>> +               child_bd->defin[i] |= new_def;
>> +               child_bd->defout[i] |= new_def;
>> +               cont |= new_def;
>> +	    }
>> +	 }
>> +      }
>> +   } while (cont);
>>  }
>>  
>>  /**
>> @@ -212,12 +236,12 @@ fs_live_variables::compute_start_end()
>>        struct block_data *bd = &block_data[block->num];
>>  
>>        for (int i = 0; i < num_vars; i++) {
>> -         if (BITSET_TEST(bd->livein, i)) {
>> +         if (BITSET_TEST(bd->livein, i) && BITSET_TEST(bd->defin, i)) {
>>              start[i] = MIN2(start[i], block->start_ip);
>>              end[i] = MAX2(end[i], block->start_ip);
>>           }
>>  
>> -         if (BITSET_TEST(bd->liveout, i)) {
>> +         if (BITSET_TEST(bd->liveout, i) && BITSET_TEST(bd->defout, i)) {
>>              start[i] = MIN2(start[i], block->end_ip);
>>              end[i] = MAX2(end[i], block->end_ip);
>>           }
>> @@ -260,6 +284,8 @@ fs_live_variables::fs_live_variables(fs_visitor *v, const cfg_t *cfg)
>>        block_data[i].use = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
>>        block_data[i].livein = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
>>        block_data[i].liveout = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
>> +      block_data[i].defin = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
>> +      block_data[i].defout = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
>>  
>>        block_data[i].flag_def[0] = 0;
>>        block_data[i].flag_use[0] = 0;
>> diff --git a/src/intel/compiler/brw_fs_live_variables.h b/src/intel/compiler/brw_fs_live_variables.h
>> index d2d5898ed1c..9e95e443170 100644
>> --- a/src/intel/compiler/brw_fs_live_variables.h
>> +++ b/src/intel/compiler/brw_fs_live_variables.h
>> @@ -55,6 +55,18 @@ struct block_data {
>>     /** Which defs reach the exit point of the block. */
>>     BITSET_WORD *liveout;
>>  
>> +   /**
>> +    * Variables such that the entry point of the block may be reached from any
>> +    * of their definitions.
>> +    */
>> +   BITSET_WORD *defin;
>> +
>> +   /**
>> +    * Variables such that the exit point of the block may be reached from any
>> +    * of their definitions.
>> +    */
>> +   BITSET_WORD *defout;
>> +
>>     BITSET_WORD flag_def[1];
>>     BITSET_WORD flag_use[1];
>>     BITSET_WORD flag_livein[1];
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev



More information about the mesa-dev mailing list