[Mesa-dev] [PATCH] st/glsl_to_tgsi: fix the DCE pass in presence of loops

Timothy Arceri tarceri at itsqueeze.com
Thu May 4 23:41:04 UTC 2017



On 04/05/17 22:28, Samuel Pitoiset wrote:
> 
> 
> On 05/04/2017 02:13 PM, Samuel Pitoiset wrote:
>>
>>
>> On 05/04/2017 12:26 PM, Timothy Arceri wrote:
>>> On 04/05/17 18:22, Samuel Pitoiset wrot
>>>> On 05/04/2017 02:36 AM, Timothy Arceri wrote:
>>>>> This and the tgsi copy_propagation pass are very slow, I'd really 
>>>>> like it if we both didn't require the pass for things to work and 
>>>>> also didn't make it any slower. perf is showing these 2 passes at 
>>>>> ~11% during Deus Ex start-up (cold cache).
>>>>
>>>> This fix shouldn't make the DCE pass really slower, shouldn't hurt.
>>>>
>>>> Well, those passes are useful because the visitor is lazy and it 
>>>> produces bunch of useless code which is going to be removed later.
>>>
>>> Sure I just mean it would be nice not to add extra reliance on this 
>>> pass, in case we are able to make the visitor smarter, and not 
>>> generate so much junk to begin with.
>>
>> I have a new approach which is much better (suggested by Nicolai).
>>
>>>
>>>>
>>>>>
>>>>> Can we not just add a DCE call to the glsl linker/compiler calls e.g.
>>>>>
>>>>>     if (ctx->Const.GLSLOptimizeConservatively) {
>>>>>        /* Run it just once. */
>>>>>        do_common_optimization(ir, true, false,
>>>>> &ctx->Const.ShaderCompilerOptions[stage],
>>>>>                                  ctx->Const.NativeIntegers);
>>>>>
>>>>>       ---> call DCE here to clean up since we don't call the opt 
>>>>> passes again <---
>>>>
>>>> This might help a bit but it won't remove useless TGSI code 
>>>> generated by the visitor.
>>>
>>> Sure but is the visitor generating code that is causing the issue you 
>>> are seeing or is it the glsl IR code?
>>
>> The visitor.
>>
>>>
>>> Seems you have an alternative solution which looks good, but maybe we 
>>> should run the glsl dead code pass once at the end anyway?
>>
>> This doesn't help for this particular situation, but I guess this 
>> might help other shaders.
> 
> Calling the DCE passes doesn't help.
> 
> diff --git i/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
> w/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index 249584d75e..fafbfaea19 100644
> --- i/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ w/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -7008,6 +7008,10 @@ st_link_shader(struct gl_context *ctx, struct 
> gl_shader_program *prog)
>               lower_if_to_cond_assign((gl_shader_stage)i, ir,
>                                       options->MaxIfDepth, if_threshold);
>            } while (has_unsupported_control_flow(ir, options));
> +
> +         do_dead_code(ir, true);
> +         do_dead_code_local(ir);
> +
>         } else {
>            /* Repeat it until it stops making changes. */
>            bool progress;
> 
> shader-db results on RadeonSI:
> 
> 47109 shaders in 29632 tests
> Totals:
> SGPRS: 1923308 -> 1923308 (0.00 %)
> VGPRS: 1133843 -> 1133839 (-0.00 %)
> Spilled SGPRs: 2516 -> 2516 (0.00 %)
> Spilled VGPRs: 65 -> 65 (0.00 %)
> Private memory VGPRs: 1184 -> 1184 (0.00 %)
> Scratch size: 1308 -> 1308 (0.00 %) dwords per thread
> Code Size: 60095968 -> 60095840 (-0.00 %) bytes
> LDS: 1077 -> 1077 (0.00 %) blocks
> Max Waves: 431889 -> 431889 (0.00 %)
> Wait states: 0 -> 0 (0.00 %)
> 
> Totals from affected shaders:
> SGPRS: 4064 -> 4064 (0.00 %)
> VGPRS: 2196 -> 2192 (-0.18 %)
> Spilled SGPRs: 0 -> 0 (0.00 %)
> Spilled VGPRs: 0 -> 0 (0.00 %)
> Private memory VGPRs: 0 -> 0 (0.00 %)
> Scratch size: 0 -> 0 (0.00 %) dwords per thread
> Code Size: 230836 -> 230708 (-0.06 %) bytes
> LDS: 0 -> 0 (0.00 %) blocks
> Max Waves: 609 -> 609 (0.00 %)
> Wait states: 0 -> 0 (0.00 %)
> 
> It only helps one or two Metro 2033 shaders.


My thinking was it might be helpful for compilation speed. Anyway its 
not a bit deal. If we do manage to disable the tgsi DCE pass at some 
point it might be more important. For now it's probably fine as is.


> 
>>
>>>
>>>>
>>>>>
>>>>>     } else {
>>>>>        /* Repeat it until it stops making changes. */
>>>>>        while (do_common_optimization(ir, true, false,
>>>>>
>>>>> &ctx->Const.ShaderCompilerOptions[stage],
>>>>>                                      ctx->Const.NativeIntegers))
>>>>>           ;
>>>>>     }
>>>>>
>>>>> On 03/05/17 00:43, Samuel Pitoiset wrote:
>>>>>> The TGSI DCE pass doesn't eliminate dead assignments like
>>>>>> MOV TEMP[0], TEMP[1] in presence of loops because it assumes
>>>>>> that the visitor doesn't emit dead code. This assumption is
>>>>>> actually wrong and this situation happens.
>>>>>>
>>>>>> However, it appears that the merge_registers() pass accidentally
>>>>>> takes care of this for some weird reasons. But since this pass has
>>>>>> been disabled for RadeonSI and Nouveau, the renumber_registers()
>>>>>> pass which is called *after*, can't do its job correctly.
>>>>>>
>>>>>> This is because it assumes that no dead code is present. But if
>>>>>> there is still a dead assignment, it might re-use the TEMP
>>>>>> register id incorrectly and emits wrong code.
>>>>>>
>>>>>> This patches eliminates all dead assignments by tracking
>>>>>> all temporary register reads like what the renumber_registers()
>>>>>> actually does. The best solution would be to rewrite that DCE
>>>>>> pass entirely but it's more work.
>>>>>>
>>>>>> This should fix Unigine Heaven on RadeonSI and Nouveau.
>>>>>>
>>>>>> shader-db results with RadeonSI:
>>>>>>
>>>>>> 47109 shaders in 29632 tests
>>>>>> Totals:
>>>>>> SGPRS: 1919548 -> 1919572 (0.00 %)
>>>>>> VGPRS: 1139205 -> 1139209 (0.00 %)
>>>>>> Spilled SGPRs: 1865 -> 1867 (0.11 %)
>>>>>> Spilled VGPRs: 65 -> 65 (0.00 %)
>>>>>> Private memory VGPRs: 1184 -> 1184 (0.00 %)
>>>>>> Scratch size: 1308 -> 1308 (0.00 %) dwords per thread
>>>>>> Code Size: 60083036 -> 60083244 (0.00 %) bytes
>>>>>> LDS: 1077 -> 1077 (0.00 %) blocks
>>>>>> Max Waves: 431200 -> 431200 (0.00 %)
>>>>>> Wait states: 0 -> 0 (0.00 %)
>>>>>>
>>>>>> It's still interesting to disable the merge_registers() pass.
>>>>>>
>>>>>> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
>>>>>> ---
>>>>>>   src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 36 
>>>>>> +++++++++++++++++++++++++-----
>>>>>>   1 file changed, 31 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
>>>>>> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>>>> index 0f8688a41c..01b5a4dc98 100644
>>>>>> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>>>> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>>>> @@ -5085,9 +5085,13 @@ 
>>>>>> glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>>>> glsl_to_tgsi_instruction *,
>>>>>> this->next_temp * 4);
>>>>>>      int *write_level = rzalloc_array(mem_ctx, int, 
>>>>>> this->next_temp * 4);
>>>>>> +   int *first_reads = rzalloc_array(mem_ctx, int, this->next_temp);
>>>>>>      int level = 0;
>>>>>>      int removed = 0;
>>>>>> +   for (int i = 0; i < this->next_temp; i++)
>>>>>> +      first_reads[i] = -1;
>>>>>> +
>>>>>>      foreach_in_list(glsl_to_tgsi_instruction, inst, 
>>>>>> &this->instructions) {
>>>>>>         assert(inst->dst[0].file != PROGRAM_TEMPORARY
>>>>>>                || inst->dst[0].index < this->next_temp);
>>>>>> @@ -5148,8 +5152,12 @@ 
>>>>>> glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>>>>                  src_chans |= 1 << GET_SWZ(inst->src[i].swizzle, 3);
>>>>>>                  for (int c = 0; c < 4; c++) {
>>>>>> -                  if (src_chans & (1 << c))
>>>>>> +                  if (src_chans & (1 << c)) {
>>>>>>                        writes[4 * inst->src[i].index + c] = NULL;
>>>>>> +
>>>>>> +                     if (first_reads[inst->src[i].index] == -1)
>>>>>> +                        first_reads[inst->src[i].index] = level;
>>>>>> +                  }
>>>>>>                  }
>>>>>>               }
>>>>>>            }
>>>>>> @@ -5167,8 +5175,12 @@ 
>>>>>> glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>>>>                  src_chans |= 1 << 
>>>>>> GET_SWZ(inst->tex_offsets[i].swizzle, 3);
>>>>>>                  for (int c = 0; c < 4; c++) {
>>>>>> -                  if (src_chans & (1 << c))
>>>>>> +                  if (src_chans & (1 << c)) {
>>>>>>                        writes[4 * inst->tex_offsets[i].index + c] 
>>>>>> = NULL;
>>>>>> +
>>>>>> +                     if (first_reads[inst->tex_offsets[i].index] 
>>>>>> == -1)
>>>>>> +                        first_reads[inst->tex_offsets[i].index] = 
>>>>>> level;
>>>>>> +                  }
>>>>>>                  }
>>>>>>               }
>>>>>>            }
>>>>>> @@ -5211,17 +5223,30 @@ 
>>>>>> glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>>>>       * the writemask of other instructions with dead channels.
>>>>>>       */
>>>>>>      foreach_in_list_safe(glsl_to_tgsi_instruction, inst, 
>>>>>> &this->instructions) {
>>>>>> -      if (!inst->dead_mask || !inst->dst[0].writemask)
>>>>>> +      bool dead_inst = false;
>>>>>> +
>>>>>> +      if (!inst->dst[0].writemask)
>>>>>>            continue;
>>>>>> +
>>>>>>         /* No amount of dead masks should remove memory stores */
>>>>>>         if (inst->info->is_store)
>>>>>>            continue;
>>>>>> -      if ((inst->dst[0].writemask & ~inst->dead_mask) == 0) {
>>>>>> +      if (!inst->dead_mask) {
>>>>>> +         if (inst->dst[0].file == PROGRAM_TEMPORARY &&
>>>>>> +             first_reads[inst->dst[0].index] == -1) {
>>>>>> +            dead_inst = true;
>>>>>> +         }
>>>>>> +      } else {
>>>>>> +         if ((inst->dst[0].writemask & ~inst->dead_mask) == 0)
>>>>>> +            dead_inst = true;
>>>>>> +      }
>>>>>> +
>>>>>> +      if (dead_inst) {
>>>>>>            inst->remove();
>>>>>>            delete inst;
>>>>>>            removed++;
>>>>>> -      } else {
>>>>>> +      } else if (inst->dead_mask) {
>>>>>>            if (glsl_base_type_is_64bit(inst->dst[0].type)) {
>>>>>>               if (inst->dead_mask == WRITEMASK_XY ||
>>>>>>                   inst->dead_mask == WRITEMASK_ZW)
>>>>>> @@ -5232,6 +5257,7 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>>>>      }
>>>>>>      ralloc_free(write_level);
>>>>>> +   ralloc_free(first_reads);
>>>>>>      ralloc_free(writes);
>>>>>>      return removed;
>>>>>>


More information about the mesa-dev mailing list