[Mesa-dev] [PATCH] st/glsl_to_tgsi: fix the DCE pass in presence of loops

Samuel Pitoiset samuel.pitoiset at gmail.com
Thu May 4 12:13:16 UTC 2017



On 05/04/2017 12:26 PM, Timothy Arceri wrote:
> On 04/05/17 18:22, Samuel Pitoiset wrot
>> On 05/04/2017 02:36 AM, Timothy Arceri wrote:
>>> This and the tgsi copy_propagation pass are very slow, I'd really 
>>> like it if we both didn't require the pass for things to work and 
>>> also didn't make it any slower. perf is showing these 2 passes at 
>>> ~11% during Deus Ex start-up (cold cache).
>>
>> This fix shouldn't make the DCE pass really slower, shouldn't hurt.
>>
>> Well, those passes are useful because the visitor is lazy and it 
>> produces bunch of useless code which is going to be removed later.
> 
> Sure I just mean it would be nice not to add extra reliance on this 
> pass, in case we are able to make the visitor smarter, and not generate 
> so much junk to begin with.

I have a new approach which is much better (suggested by Nicolai).

> 
>>
>>>
>>> Can we not just add a DCE call to the glsl linker/compiler calls e.g.
>>>
>>>     if (ctx->Const.GLSLOptimizeConservatively) {
>>>        /* Run it just once. */
>>>        do_common_optimization(ir, true, false,
>>> &ctx->Const.ShaderCompilerOptions[stage],
>>>                                  ctx->Const.NativeIntegers);
>>>
>>>       ---> call DCE here to clean up since we don't call the opt 
>>> passes again <---
>>
>> This might help a bit but it won't remove useless TGSI code generated 
>> by the visitor.
> 
> Sure but is the visitor generating code that is causing the issue you 
> are seeing or is it the glsl IR code?

The visitor.

> 
> Seems you have an alternative solution which looks good, but maybe we 
> should run the glsl dead code pass once at the end anyway?

This doesn't help for this particular situation, but I guess this might 
help other shaders.

> 
>>
>>>
>>>     } else {
>>>        /* Repeat it until it stops making changes. */
>>>        while (do_common_optimization(ir, true, false,
>>>
>>> &ctx->Const.ShaderCompilerOptions[stage],
>>>                                      ctx->Const.NativeIntegers))
>>>           ;
>>>     }
>>>
>>> On 03/05/17 00:43, Samuel Pitoiset wrote:
>>>> The TGSI DCE pass doesn't eliminate dead assignments like
>>>> MOV TEMP[0], TEMP[1] in presence of loops because it assumes
>>>> that the visitor doesn't emit dead code. This assumption is
>>>> actually wrong and this situation happens.
>>>>
>>>> However, it appears that the merge_registers() pass accidentally
>>>> takes care of this for some weird reasons. But since this pass has
>>>> been disabled for RadeonSI and Nouveau, the renumber_registers()
>>>> pass which is called *after*, can't do its job correctly.
>>>>
>>>> This is because it assumes that no dead code is present. But if
>>>> there is still a dead assignment, it might re-use the TEMP
>>>> register id incorrectly and emits wrong code.
>>>>
>>>> This patches eliminates all dead assignments by tracking
>>>> all temporary register reads like what the renumber_registers()
>>>> actually does. The best solution would be to rewrite that DCE
>>>> pass entirely but it's more work.
>>>>
>>>> This should fix Unigine Heaven on RadeonSI and Nouveau.
>>>>
>>>> shader-db results with RadeonSI:
>>>>
>>>> 47109 shaders in 29632 tests
>>>> Totals:
>>>> SGPRS: 1919548 -> 1919572 (0.00 %)
>>>> VGPRS: 1139205 -> 1139209 (0.00 %)
>>>> Spilled SGPRs: 1865 -> 1867 (0.11 %)
>>>> Spilled VGPRs: 65 -> 65 (0.00 %)
>>>> Private memory VGPRs: 1184 -> 1184 (0.00 %)
>>>> Scratch size: 1308 -> 1308 (0.00 %) dwords per thread
>>>> Code Size: 60083036 -> 60083244 (0.00 %) bytes
>>>> LDS: 1077 -> 1077 (0.00 %) blocks
>>>> Max Waves: 431200 -> 431200 (0.00 %)
>>>> Wait states: 0 -> 0 (0.00 %)
>>>>
>>>> It's still interesting to disable the merge_registers() pass.
>>>>
>>>> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
>>>> ---
>>>>   src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 36 
>>>> +++++++++++++++++++++++++-----
>>>>   1 file changed, 31 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
>>>> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>> index 0f8688a41c..01b5a4dc98 100644
>>>> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>> @@ -5085,9 +5085,13 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>> glsl_to_tgsi_instruction *,
>>>> this->next_temp * 4);
>>>>      int *write_level = rzalloc_array(mem_ctx, int, this->next_temp 
>>>> * 4);
>>>> +   int *first_reads = rzalloc_array(mem_ctx, int, this->next_temp);
>>>>      int level = 0;
>>>>      int removed = 0;
>>>> +   for (int i = 0; i < this->next_temp; i++)
>>>> +      first_reads[i] = -1;
>>>> +
>>>>      foreach_in_list(glsl_to_tgsi_instruction, inst, 
>>>> &this->instructions) {
>>>>         assert(inst->dst[0].file != PROGRAM_TEMPORARY
>>>>                || inst->dst[0].index < this->next_temp);
>>>> @@ -5148,8 +5152,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>>                  src_chans |= 1 << GET_SWZ(inst->src[i].swizzle, 3);
>>>>                  for (int c = 0; c < 4; c++) {
>>>> -                  if (src_chans & (1 << c))
>>>> +                  if (src_chans & (1 << c)) {
>>>>                        writes[4 * inst->src[i].index + c] = NULL;
>>>> +
>>>> +                     if (first_reads[inst->src[i].index] == -1)
>>>> +                        first_reads[inst->src[i].index] = level;
>>>> +                  }
>>>>                  }
>>>>               }
>>>>            }
>>>> @@ -5167,8 +5175,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>>                  src_chans |= 1 << 
>>>> GET_SWZ(inst->tex_offsets[i].swizzle, 3);
>>>>                  for (int c = 0; c < 4; c++) {
>>>> -                  if (src_chans & (1 << c))
>>>> +                  if (src_chans & (1 << c)) {
>>>>                        writes[4 * inst->tex_offsets[i].index + c] = 
>>>> NULL;
>>>> +
>>>> +                     if (first_reads[inst->tex_offsets[i].index] == 
>>>> -1)
>>>> +                        first_reads[inst->tex_offsets[i].index] = 
>>>> level;
>>>> +                  }
>>>>                  }
>>>>               }
>>>>            }
>>>> @@ -5211,17 +5223,30 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>>       * the writemask of other instructions with dead channels.
>>>>       */
>>>>      foreach_in_list_safe(glsl_to_tgsi_instruction, inst, 
>>>> &this->instructions) {
>>>> -      if (!inst->dead_mask || !inst->dst[0].writemask)
>>>> +      bool dead_inst = false;
>>>> +
>>>> +      if (!inst->dst[0].writemask)
>>>>            continue;
>>>> +
>>>>         /* No amount of dead masks should remove memory stores */
>>>>         if (inst->info->is_store)
>>>>            continue;
>>>> -      if ((inst->dst[0].writemask & ~inst->dead_mask) == 0) {
>>>> +      if (!inst->dead_mask) {
>>>> +         if (inst->dst[0].file == PROGRAM_TEMPORARY &&
>>>> +             first_reads[inst->dst[0].index] == -1) {
>>>> +            dead_inst = true;
>>>> +         }
>>>> +      } else {
>>>> +         if ((inst->dst[0].writemask & ~inst->dead_mask) == 0)
>>>> +            dead_inst = true;
>>>> +      }
>>>> +
>>>> +      if (dead_inst) {
>>>>            inst->remove();
>>>>            delete inst;
>>>>            removed++;
>>>> -      } else {
>>>> +      } else if (inst->dead_mask) {
>>>>            if (glsl_base_type_is_64bit(inst->dst[0].type)) {
>>>>               if (inst->dead_mask == WRITEMASK_XY ||
>>>>                   inst->dead_mask == WRITEMASK_ZW)
>>>> @@ -5232,6 +5257,7 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>>      }
>>>>      ralloc_free(write_level);
>>>> +   ralloc_free(first_reads);
>>>>      ralloc_free(writes);
>>>>      return removed;
>>>>


More information about the mesa-dev mailing list