[Mesa-dev] [PATCH] st/glsl_to_tgsi: fix the DCE pass in presence of loops

Samuel Pitoiset samuel.pitoiset at gmail.com
Thu May 4 08:22:28 UTC 2017



On 05/04/2017 02:36 AM, Timothy Arceri wrote:
> This and the tgsi copy_propagation pass are very slow, I'd really like 
> it if we both didn't require the pass for things to work and also didn't 
> make it any slower. perf is showing these 2 passes at ~11% during Deus 
> Ex start-up (cold cache).

This fix shouldn't make the DCE pass really slower, shouldn't hurt.

Well, those passes are useful because the visitor is lazy and it 
produces bunch of useless code which is going to be removed later.

> 
> Can we not just add a DCE call to the glsl linker/compiler calls e.g.
> 
>     if (ctx->Const.GLSLOptimizeConservatively) {
>        /* Run it just once. */
>        do_common_optimization(ir, true, false,
>                                  &ctx->Const.ShaderCompilerOptions[stage],
>                                  ctx->Const.NativeIntegers);
> 
>       ---> call DCE here to clean up since we don't call the opt passes 
> again <---

This might help a bit but it won't remove useless TGSI code generated by 
the visitor.

> 
>     } else {
>        /* Repeat it until it stops making changes. */
>        while (do_common_optimization(ir, true, false,
> 
> &ctx->Const.ShaderCompilerOptions[stage],
>                                      ctx->Const.NativeIntegers))
>           ;
>     }
> 
> On 03/05/17 00:43, Samuel Pitoiset wrote:
>> The TGSI DCE pass doesn't eliminate dead assignments like
>> MOV TEMP[0], TEMP[1] in presence of loops because it assumes
>> that the visitor doesn't emit dead code. This assumption is
>> actually wrong and this situation happens.
>>
>> However, it appears that the merge_registers() pass accidentally
>> takes care of this for some weird reasons. But since this pass has
>> been disabled for RadeonSI and Nouveau, the renumber_registers()
>> pass which is called *after*, can't do its job correctly.
>>
>> This is because it assumes that no dead code is present. But if
>> there is still a dead assignment, it might re-use the TEMP
>> register id incorrectly and emits wrong code.
>>
>> This patches eliminates all dead assignments by tracking
>> all temporary register reads like what the renumber_registers()
>> actually does. The best solution would be to rewrite that DCE
>> pass entirely but it's more work.
>>
>> This should fix Unigine Heaven on RadeonSI and Nouveau.
>>
>> shader-db results with RadeonSI:
>>
>> 47109 shaders in 29632 tests
>> Totals:
>> SGPRS: 1919548 -> 1919572 (0.00 %)
>> VGPRS: 1139205 -> 1139209 (0.00 %)
>> Spilled SGPRs: 1865 -> 1867 (0.11 %)
>> Spilled VGPRs: 65 -> 65 (0.00 %)
>> Private memory VGPRs: 1184 -> 1184 (0.00 %)
>> Scratch size: 1308 -> 1308 (0.00 %) dwords per thread
>> Code Size: 60083036 -> 60083244 (0.00 %) bytes
>> LDS: 1077 -> 1077 (0.00 %) blocks
>> Max Waves: 431200 -> 431200 (0.00 %)
>> Wait states: 0 -> 0 (0.00 %)
>>
>> It's still interesting to disable the merge_registers() pass.
>>
>> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
>> ---
>>   src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 36 
>> +++++++++++++++++++++++++-----
>>   1 file changed, 31 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
>> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> index 0f8688a41c..01b5a4dc98 100644
>> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> @@ -5085,9 +5085,13 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>                                                        
>> glsl_to_tgsi_instruction *,
>>                                                        this->next_temp 
>> * 4);
>>      int *write_level = rzalloc_array(mem_ctx, int, this->next_temp * 4);
>> +   int *first_reads = rzalloc_array(mem_ctx, int, this->next_temp);
>>      int level = 0;
>>      int removed = 0;
>> +   for (int i = 0; i < this->next_temp; i++)
>> +      first_reads[i] = -1;
>> +
>>      foreach_in_list(glsl_to_tgsi_instruction, inst, 
>> &this->instructions) {
>>         assert(inst->dst[0].file != PROGRAM_TEMPORARY
>>                || inst->dst[0].index < this->next_temp);
>> @@ -5148,8 +5152,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>                  src_chans |= 1 << GET_SWZ(inst->src[i].swizzle, 3);
>>                  for (int c = 0; c < 4; c++) {
>> -                  if (src_chans & (1 << c))
>> +                  if (src_chans & (1 << c)) {
>>                        writes[4 * inst->src[i].index + c] = NULL;
>> +
>> +                     if (first_reads[inst->src[i].index] == -1)
>> +                        first_reads[inst->src[i].index] = level;
>> +                  }
>>                  }
>>               }
>>            }
>> @@ -5167,8 +5175,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>                  src_chans |= 1 << 
>> GET_SWZ(inst->tex_offsets[i].swizzle, 3);
>>                  for (int c = 0; c < 4; c++) {
>> -                  if (src_chans & (1 << c))
>> +                  if (src_chans & (1 << c)) {
>>                        writes[4 * inst->tex_offsets[i].index + c] = NULL;
>> +
>> +                     if (first_reads[inst->tex_offsets[i].index] == -1)
>> +                        first_reads[inst->tex_offsets[i].index] = level;
>> +                  }
>>                  }
>>               }
>>            }
>> @@ -5211,17 +5223,30 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>       * the writemask of other instructions with dead channels.
>>       */
>>      foreach_in_list_safe(glsl_to_tgsi_instruction, inst, 
>> &this->instructions) {
>> -      if (!inst->dead_mask || !inst->dst[0].writemask)
>> +      bool dead_inst = false;
>> +
>> +      if (!inst->dst[0].writemask)
>>            continue;
>> +
>>         /* No amount of dead masks should remove memory stores */
>>         if (inst->info->is_store)
>>            continue;
>> -      if ((inst->dst[0].writemask & ~inst->dead_mask) == 0) {
>> +      if (!inst->dead_mask) {
>> +         if (inst->dst[0].file == PROGRAM_TEMPORARY &&
>> +             first_reads[inst->dst[0].index] == -1) {
>> +            dead_inst = true;
>> +         }
>> +      } else {
>> +         if ((inst->dst[0].writemask & ~inst->dead_mask) == 0)
>> +            dead_inst = true;
>> +      }
>> +
>> +      if (dead_inst) {
>>            inst->remove();
>>            delete inst;
>>            removed++;
>> -      } else {
>> +      } else if (inst->dead_mask) {
>>            if (glsl_base_type_is_64bit(inst->dst[0].type)) {
>>               if (inst->dead_mask == WRITEMASK_XY ||
>>                   inst->dead_mask == WRITEMASK_ZW)
>> @@ -5232,6 +5257,7 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>      }
>>      ralloc_free(write_level);
>> +   ralloc_free(first_reads);
>>      ralloc_free(writes);
>>      return removed;
>>


More information about the mesa-dev mailing list