[Mesa-dev] [PATCH] st/glsl_to_tgsi: fix the DCE pass in presence of loops
Ilia Mirkin
imirkin at alum.mit.edu
Thu May 4 00:43:01 UTC 2017
I looked into removing these passes a while back. A little detail here
is that DCE has to get run at least a little bit for correctness
reasons -- some (lazy) glsl -> tgsi code uses the dead_mask with the
assumption that DCE is run later.
Another observation was that while, indeed, those passes take a while,
they also remove a bunch of code. Not running those passes makes them
disappear from the profile, but then the logic down the line has more
work to do. It's not as obvious of a win as one might think.
Additionally, at least for nouveau, this logic does better than
nouveau's codegen in some cases. (Largely due to the fact that system
values get copy-propagated here but they don't by codegen, generally,
which can lead to large register live intervals. At least that's what
I recall from my analysis a while ago.)
This was my hack-branch: https://github.com/imirkin/mesa/commits/st-pass-cleanup
On Wed, May 3, 2017 at 8:36 PM, Timothy Arceri <tarceri at itsqueeze.com> wrote:
> This and the tgsi copy_propagation pass are very slow, I'd really like it if
> we both didn't require the pass for things to work and also didn't make it
> any slower. perf is showing these 2 passes at ~11% during Deus Ex start-up
> (cold cache).
>
> Can we not just add a DCE call to the glsl linker/compiler calls e.g.
>
> if (ctx->Const.GLSLOptimizeConservatively) {
> /* Run it just once. */
> do_common_optimization(ir, true, false,
> &ctx->Const.ShaderCompilerOptions[stage],
> ctx->Const.NativeIntegers);
>
> ---> call DCE here to clean up since we don't call the opt passes again
> <---
>
> } else {
> /* Repeat it until it stops making changes. */
> while (do_common_optimization(ir, true, false,
>
> &ctx->Const.ShaderCompilerOptions[stage],
> ctx->Const.NativeIntegers))
> ;
>
> }
>
> On 03/05/17 00:43, Samuel Pitoiset wrote:
>>
>> The TGSI DCE pass doesn't eliminate dead assignments like
>> MOV TEMP[0], TEMP[1] in presence of loops because it assumes
>> that the visitor doesn't emit dead code. This assumption is
>> actually wrong and this situation happens.
>>
>> However, it appears that the merge_registers() pass accidentally
>> takes care of this for some weird reasons. But since this pass has
>> been disabled for RadeonSI and Nouveau, the renumber_registers()
>> pass which is called *after*, can't do its job correctly.
>>
>> This is because it assumes that no dead code is present. But if
>> there is still a dead assignment, it might re-use the TEMP
>> register id incorrectly and emits wrong code.
>>
>> This patches eliminates all dead assignments by tracking
>> all temporary register reads like what the renumber_registers()
>> actually does. The best solution would be to rewrite that DCE
>> pass entirely but it's more work.
>>
>> This should fix Unigine Heaven on RadeonSI and Nouveau.
>>
>> shader-db results with RadeonSI:
>>
>> 47109 shaders in 29632 tests
>> Totals:
>> SGPRS: 1919548 -> 1919572 (0.00 %)
>> VGPRS: 1139205 -> 1139209 (0.00 %)
>> Spilled SGPRs: 1865 -> 1867 (0.11 %)
>> Spilled VGPRs: 65 -> 65 (0.00 %)
>> Private memory VGPRs: 1184 -> 1184 (0.00 %)
>> Scratch size: 1308 -> 1308 (0.00 %) dwords per thread
>> Code Size: 60083036 -> 60083244 (0.00 %) bytes
>> LDS: 1077 -> 1077 (0.00 %) blocks
>> Max Waves: 431200 -> 431200 (0.00 %)
>> Wait states: 0 -> 0 (0.00 %)
>>
>> It's still interesting to disable the merge_registers() pass.
>>
>> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
>> ---
>> src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 36
>> +++++++++++++++++++++++++-----
>> 1 file changed, 31 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> index 0f8688a41c..01b5a4dc98 100644
>> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> @@ -5085,9 +5085,13 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>
>> glsl_to_tgsi_instruction *,
>> this->next_temp *
>> 4);
>> int *write_level = rzalloc_array(mem_ctx, int, this->next_temp * 4);
>> + int *first_reads = rzalloc_array(mem_ctx, int, this->next_temp);
>> int level = 0;
>> int removed = 0;
>> + for (int i = 0; i < this->next_temp; i++)
>> + first_reads[i] = -1;
>> +
>> foreach_in_list(glsl_to_tgsi_instruction, inst, &this->instructions)
>> {
>> assert(inst->dst[0].file != PROGRAM_TEMPORARY
>> || inst->dst[0].index < this->next_temp);
>> @@ -5148,8 +5152,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>> src_chans |= 1 << GET_SWZ(inst->src[i].swizzle, 3);
>> for (int c = 0; c < 4; c++) {
>> - if (src_chans & (1 << c))
>> + if (src_chans & (1 << c)) {
>> writes[4 * inst->src[i].index + c] = NULL;
>> +
>> + if (first_reads[inst->src[i].index] == -1)
>> + first_reads[inst->src[i].index] = level;
>> + }
>> }
>> }
>> }
>> @@ -5167,8 +5175,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>> src_chans |= 1 << GET_SWZ(inst->tex_offsets[i].swizzle,
>> 3);
>> for (int c = 0; c < 4; c++) {
>> - if (src_chans & (1 << c))
>> + if (src_chans & (1 << c)) {
>> writes[4 * inst->tex_offsets[i].index + c] = NULL;
>> +
>> + if (first_reads[inst->tex_offsets[i].index] == -1)
>> + first_reads[inst->tex_offsets[i].index] = level;
>> + }
>> }
>> }
>> }
>> @@ -5211,17 +5223,30 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>> * the writemask of other instructions with dead channels.
>> */
>> foreach_in_list_safe(glsl_to_tgsi_instruction, inst,
>> &this->instructions) {
>> - if (!inst->dead_mask || !inst->dst[0].writemask)
>> + bool dead_inst = false;
>> +
>> + if (!inst->dst[0].writemask)
>> continue;
>> +
>> /* No amount of dead masks should remove memory stores */
>> if (inst->info->is_store)
>> continue;
>> - if ((inst->dst[0].writemask & ~inst->dead_mask) == 0) {
>> + if (!inst->dead_mask) {
>> + if (inst->dst[0].file == PROGRAM_TEMPORARY &&
>> + first_reads[inst->dst[0].index] == -1) {
>> + dead_inst = true;
>> + }
>> + } else {
>> + if ((inst->dst[0].writemask & ~inst->dead_mask) == 0)
>> + dead_inst = true;
>> + }
>> +
>> + if (dead_inst) {
>> inst->remove();
>> delete inst;
>> removed++;
>> - } else {
>> + } else if (inst->dead_mask) {
>> if (glsl_base_type_is_64bit(inst->dst[0].type)) {
>> if (inst->dead_mask == WRITEMASK_XY ||
>> inst->dead_mask == WRITEMASK_ZW)
>> @@ -5232,6 +5257,7 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>> }
>> ralloc_free(write_level);
>> + ralloc_free(first_reads);
>> ralloc_free(writes);
>> return removed;
>>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list