[Mesa-dev] [PATCH] st/glsl_to_tgsi: fix the DCE pass in presence of loops

Nicolai Hähnle nhaehnle at gmail.com
Wed May 3 16:21:24 UTC 2017


On 02.05.2017 16:43, Samuel Pitoiset wrote:
> The TGSI DCE pass doesn't eliminate dead assignments like
> MOV TEMP[0], TEMP[1] in presence of loops because it assumes
> that the visitor doesn't emit dead code. This assumption is
> actually wrong and this situation happens.
>
> However, it appears that the merge_registers() pass accidentally
> takes care of this for some weird reasons. But since this pass has
> been disabled for RadeonSI and Nouveau, the renumber_registers()
> pass which is called *after*, can't do its job correctly.
>
> This is because it assumes that no dead code is present. But if
> there is still a dead assignment, it might re-use the TEMP
> register id incorrectly and emits wrong code.
>
> This patches eliminates all dead assignments by tracking
> all temporary register reads like what the renumber_registers()
> actually does. The best solution would be to rewrite that DCE
> pass entirely but it's more work.
>
> This should fix Unigine Heaven on RadeonSI and Nouveau.
>
> shader-db results with RadeonSI:
>
> 47109 shaders in 29632 tests
> Totals:
> SGPRS: 1919548 -> 1919572 (0.00 %)
> VGPRS: 1139205 -> 1139209 (0.00 %)
> Spilled SGPRs: 1865 -> 1867 (0.11 %)
> Spilled VGPRs: 65 -> 65 (0.00 %)
> Private memory VGPRs: 1184 -> 1184 (0.00 %)
> Scratch size: 1308 -> 1308 (0.00 %) dwords per thread
> Code Size: 60083036 -> 60083244 (0.00 %) bytes
> LDS: 1077 -> 1077 (0.00 %) blocks
> Max Waves: 431200 -> 431200 (0.00 %)
> Wait states: 0 -> 0 (0.00 %)
>
> It's still interesting to disable the merge_registers() pass.
>
> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
> ---
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 36 +++++++++++++++++++++++++-----
>  1 file changed, 31 insertions(+), 5 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index 0f8688a41c..01b5a4dc98 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -5085,9 +5085,13 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>                                                       glsl_to_tgsi_instruction *,
>                                                       this->next_temp * 4);
>     int *write_level = rzalloc_array(mem_ctx, int, this->next_temp * 4);
> +   int *first_reads = rzalloc_array(mem_ctx, int, this->next_temp);
>     int level = 0;
>     int removed = 0;
>
> +   for (int i = 0; i < this->next_temp; i++)
> +      first_reads[i] = -1;

As discussed offline, it would be good to fix renumber_registers to work 
without this assumption.

If you want to go ahead with this patch anyway / on top of this: the 
value of first_reads doesn't seem to matter, so why not make it an array 
of bool instead?

Cheers,
Nicolai


> +
>     foreach_in_list(glsl_to_tgsi_instruction, inst, &this->instructions) {
>        assert(inst->dst[0].file != PROGRAM_TEMPORARY
>               || inst->dst[0].index < this->next_temp);
> @@ -5148,8 +5152,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>                 src_chans |= 1 << GET_SWZ(inst->src[i].swizzle, 3);
>
>                 for (int c = 0; c < 4; c++) {
> -                  if (src_chans & (1 << c))
> +                  if (src_chans & (1 << c)) {
>                       writes[4 * inst->src[i].index + c] = NULL;
> +
> +                     if (first_reads[inst->src[i].index] == -1)
> +                        first_reads[inst->src[i].index] = level;
> +                  }
>                 }
>              }
>           }
> @@ -5167,8 +5175,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>                 src_chans |= 1 << GET_SWZ(inst->tex_offsets[i].swizzle, 3);
>
>                 for (int c = 0; c < 4; c++) {
> -                  if (src_chans & (1 << c))
> +                  if (src_chans & (1 << c)) {
>                       writes[4 * inst->tex_offsets[i].index + c] = NULL;
> +
> +                     if (first_reads[inst->tex_offsets[i].index] == -1)
> +                        first_reads[inst->tex_offsets[i].index] = level;
> +                  }
>                 }
>              }
>           }
> @@ -5211,17 +5223,30 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>      * the writemask of other instructions with dead channels.
>      */
>     foreach_in_list_safe(glsl_to_tgsi_instruction, inst, &this->instructions) {
> -      if (!inst->dead_mask || !inst->dst[0].writemask)
> +      bool dead_inst = false;
> +
> +      if (!inst->dst[0].writemask)
>           continue;
> +
>        /* No amount of dead masks should remove memory stores */
>        if (inst->info->is_store)
>           continue;
>
> -      if ((inst->dst[0].writemask & ~inst->dead_mask) == 0) {
> +      if (!inst->dead_mask) {
> +         if (inst->dst[0].file == PROGRAM_TEMPORARY &&
> +             first_reads[inst->dst[0].index] == -1) {
> +            dead_inst = true;
> +         }
> +      } else {
> +         if ((inst->dst[0].writemask & ~inst->dead_mask) == 0)
> +            dead_inst = true;
> +      }
> +
> +      if (dead_inst) {
>           inst->remove();
>           delete inst;
>           removed++;
> -      } else {
> +      } else if (inst->dead_mask) {
>           if (glsl_base_type_is_64bit(inst->dst[0].type)) {
>              if (inst->dead_mask == WRITEMASK_XY ||
>                  inst->dead_mask == WRITEMASK_ZW)
> @@ -5232,6 +5257,7 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>     }
>
>     ralloc_free(write_level);
> +   ralloc_free(first_reads);
>     ralloc_free(writes);
>
>     return removed;
>


-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.


More information about the mesa-dev mailing list