[Mesa-dev] [PATCH] st/glsl_to_tgsi: fix the DCE pass in presence of loops

Timothy Arceri tarceri at itsqueeze.com
Thu May 4 01:06:48 UTC 2017


On 04/05/17 10:43, Ilia Mirkin wrote:
> I looked into removing these passes a while back. A little detail here
> is that DCE has to get run at least a little bit for correctness
> reasons -- some (lazy) glsl -> tgsi

Is there anyway to improve this? I was actually looking at this a little 
yesterday but to be honest I was getting a little lost in the code. It 
wasn't entirely clear to me how this conversion pass works and if we can 
stop doing this in the first place.

glsl_to_tgsi_visitor::visit_expression(ir_expression* ir, st_src_reg 
*op) and glsl_to_tgsi_visitor::visit(ir_texture *ir)

Have a comment:

    /* Storage for our result.  Ideally for an assignment we'd be using
     * the actual storage for the result here, instead.
     */

The code all seems to be pretty much a copy of ir_to_mesa which has the 
same limitations.

> code uses the dead_mask with the
> assumption that DCE is run later.
> 
> Another observation was that while, indeed, those passes take a while,
> they also remove a bunch of code. Not running those passes makes them
> disappear from the profile, but then the logic down the line has more
> work to do. It's not as obvious of a win as one might think.

Yes I noticed this also :)

> 
> Additionally, at least for nouveau, this logic does better than
> nouveau's codegen in some cases. (Largely due to the fact that system
> values get copy-propagated here but they don't by codegen, generally,
> which can lead to large register live intervals. At least that's what
> I recall from my analysis a while ago.)
> 
> This was my hack-branch: https://github.com/imirkin/mesa/commits/st-pass-cleanup
> 
> On Wed, May 3, 2017 at 8:36 PM, Timothy Arceri <tarceri at itsqueeze.com> wrote:
>> This and the tgsi copy_propagation pass are very slow, I'd really like it if
>> we both didn't require the pass for things to work and also didn't make it
>> any slower. perf is showing these 2 passes at ~11% during Deus Ex start-up
>> (cold cache).
>>
>> Can we not just add a DCE call to the glsl linker/compiler calls e.g.
>>
>>     if (ctx->Const.GLSLOptimizeConservatively) {
>>        /* Run it just once. */
>>        do_common_optimization(ir, true, false,
>>                                  &ctx->Const.ShaderCompilerOptions[stage],
>>                                  ctx->Const.NativeIntegers);
>>
>>       ---> call DCE here to clean up since we don't call the opt passes again
>> <---
>>
>>     } else {
>>        /* Repeat it until it stops making changes. */
>>        while (do_common_optimization(ir, true, false,
>>
>> &ctx->Const.ShaderCompilerOptions[stage],
>>                                      ctx->Const.NativeIntegers))
>>           ;
>>
>>     }
>>
>> On 03/05/17 00:43, Samuel Pitoiset wrote:
>>>
>>> The TGSI DCE pass doesn't eliminate dead assignments like
>>> MOV TEMP[0], TEMP[1] in presence of loops because it assumes
>>> that the visitor doesn't emit dead code. This assumption is
>>> actually wrong and this situation happens.
>>>
>>> However, it appears that the merge_registers() pass accidentally
>>> takes care of this for some weird reasons. But since this pass has
>>> been disabled for RadeonSI and Nouveau, the renumber_registers()
>>> pass which is called *after*, can't do its job correctly.
>>>
>>> This is because it assumes that no dead code is present. But if
>>> there is still a dead assignment, it might re-use the TEMP
>>> register id incorrectly and emits wrong code.
>>>
>>> This patches eliminates all dead assignments by tracking
>>> all temporary register reads like what the renumber_registers()
>>> actually does. The best solution would be to rewrite that DCE
>>> pass entirely but it's more work.
>>>
>>> This should fix Unigine Heaven on RadeonSI and Nouveau.
>>>
>>> shader-db results with RadeonSI:
>>>
>>> 47109 shaders in 29632 tests
>>> Totals:
>>> SGPRS: 1919548 -> 1919572 (0.00 %)
>>> VGPRS: 1139205 -> 1139209 (0.00 %)
>>> Spilled SGPRs: 1865 -> 1867 (0.11 %)
>>> Spilled VGPRs: 65 -> 65 (0.00 %)
>>> Private memory VGPRs: 1184 -> 1184 (0.00 %)
>>> Scratch size: 1308 -> 1308 (0.00 %) dwords per thread
>>> Code Size: 60083036 -> 60083244 (0.00 %) bytes
>>> LDS: 1077 -> 1077 (0.00 %) blocks
>>> Max Waves: 431200 -> 431200 (0.00 %)
>>> Wait states: 0 -> 0 (0.00 %)
>>>
>>> It's still interesting to disable the merge_registers() pass.
>>>
>>> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
>>> ---
>>>    src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 36
>>> +++++++++++++++++++++++++-----
>>>    1 file changed, 31 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>> index 0f8688a41c..01b5a4dc98 100644
>>> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>> @@ -5085,9 +5085,13 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>
>>> glsl_to_tgsi_instruction *,
>>>                                                         this->next_temp *
>>> 4);
>>>       int *write_level = rzalloc_array(mem_ctx, int, this->next_temp * 4);
>>> +   int *first_reads = rzalloc_array(mem_ctx, int, this->next_temp);
>>>       int level = 0;
>>>       int removed = 0;
>>>    +   for (int i = 0; i < this->next_temp; i++)
>>> +      first_reads[i] = -1;
>>> +
>>>       foreach_in_list(glsl_to_tgsi_instruction, inst, &this->instructions)
>>> {
>>>          assert(inst->dst[0].file != PROGRAM_TEMPORARY
>>>                 || inst->dst[0].index < this->next_temp);
>>> @@ -5148,8 +5152,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>                   src_chans |= 1 << GET_SWZ(inst->src[i].swizzle, 3);
>>>                     for (int c = 0; c < 4; c++) {
>>> -                  if (src_chans & (1 << c))
>>> +                  if (src_chans & (1 << c)) {
>>>                         writes[4 * inst->src[i].index + c] = NULL;
>>> +
>>> +                     if (first_reads[inst->src[i].index] == -1)
>>> +                        first_reads[inst->src[i].index] = level;
>>> +                  }
>>>                   }
>>>                }
>>>             }
>>> @@ -5167,8 +5175,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>                   src_chans |= 1 << GET_SWZ(inst->tex_offsets[i].swizzle,
>>> 3);
>>>                     for (int c = 0; c < 4; c++) {
>>> -                  if (src_chans & (1 << c))
>>> +                  if (src_chans & (1 << c)) {
>>>                         writes[4 * inst->tex_offsets[i].index + c] = NULL;
>>> +
>>> +                     if (first_reads[inst->tex_offsets[i].index] == -1)
>>> +                        first_reads[inst->tex_offsets[i].index] = level;
>>> +                  }
>>>                   }
>>>                }
>>>             }
>>> @@ -5211,17 +5223,30 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>        * the writemask of other instructions with dead channels.
>>>        */
>>>       foreach_in_list_safe(glsl_to_tgsi_instruction, inst,
>>> &this->instructions) {
>>> -      if (!inst->dead_mask || !inst->dst[0].writemask)
>>> +      bool dead_inst = false;
>>> +
>>> +      if (!inst->dst[0].writemask)
>>>             continue;
>>> +
>>>          /* No amount of dead masks should remove memory stores */
>>>          if (inst->info->is_store)
>>>             continue;
>>>    -      if ((inst->dst[0].writemask & ~inst->dead_mask) == 0) {
>>> +      if (!inst->dead_mask) {
>>> +         if (inst->dst[0].file == PROGRAM_TEMPORARY &&
>>> +             first_reads[inst->dst[0].index] == -1) {
>>> +            dead_inst = true;
>>> +         }
>>> +      } else {
>>> +         if ((inst->dst[0].writemask & ~inst->dead_mask) == 0)
>>> +            dead_inst = true;
>>> +      }
>>> +
>>> +      if (dead_inst) {
>>>             inst->remove();
>>>             delete inst;
>>>             removed++;
>>> -      } else {
>>> +      } else if (inst->dead_mask) {
>>>             if (glsl_base_type_is_64bit(inst->dst[0].type)) {
>>>                if (inst->dead_mask == WRITEMASK_XY ||
>>>                    inst->dead_mask == WRITEMASK_ZW)
>>> @@ -5232,6 +5257,7 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>       }
>>>         ralloc_free(write_level);
>>> +   ralloc_free(first_reads);
>>>       ralloc_free(writes);
>>>         return removed;
>>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list