[Mesa-dev] [PATCH] st/glsl_to_tgsi: fix the DCE pass in presence of loops
Samuel Pitoiset
samuel.pitoiset at gmail.com
Thu May 4 12:28:39 UTC 2017
On 05/04/2017 02:13 PM, Samuel Pitoiset wrote:
>
>
> On 05/04/2017 12:26 PM, Timothy Arceri wrote:
>> On 04/05/17 18:22, Samuel Pitoiset wrot
>>> On 05/04/2017 02:36 AM, Timothy Arceri wrote:
>>>> This and the tgsi copy_propagation pass are very slow, I'd really
>>>> like it if we both didn't require the pass for things to work and
>>>> also didn't make it any slower. perf is showing these 2 passes at
>>>> ~11% during Deus Ex start-up (cold cache).
>>>
>>> This fix shouldn't make the DCE pass really slower, shouldn't hurt.
>>>
>>> Well, those passes are useful because the visitor is lazy and it
>>> produces bunch of useless code which is going to be removed later.
>>
>> Sure I just mean it would be nice not to add extra reliance on this
>> pass, in case we are able to make the visitor smarter, and not
>> generate so much junk to begin with.
>
> I have a new approach which is much better (suggested by Nicolai).
>
>>
>>>
>>>>
>>>> Can we not just add a DCE call to the glsl linker/compiler calls e.g.
>>>>
>>>> if (ctx->Const.GLSLOptimizeConservatively) {
>>>> /* Run it just once. */
>>>> do_common_optimization(ir, true, false,
>>>> &ctx->Const.ShaderCompilerOptions[stage],
>>>> ctx->Const.NativeIntegers);
>>>>
>>>> ---> call DCE here to clean up since we don't call the opt
>>>> passes again <---
>>>
>>> This might help a bit but it won't remove useless TGSI code generated
>>> by the visitor.
>>
>> Sure but is the visitor generating code that is causing the issue you
>> are seeing or is it the glsl IR code?
>
> The visitor.
>
>>
>> Seems you have an alternative solution which looks good, but maybe we
>> should run the glsl dead code pass once at the end anyway?
>
> This doesn't help for this particular situation, but I guess this might
> help other shaders.
Calling the DCE passes doesn't help.
diff --git i/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
w/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 249584d75e..fafbfaea19 100644
--- i/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ w/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -7008,6 +7008,10 @@ st_link_shader(struct gl_context *ctx, struct
gl_shader_program *prog)
lower_if_to_cond_assign((gl_shader_stage)i, ir,
options->MaxIfDepth, if_threshold);
} while (has_unsupported_control_flow(ir, options));
+
+ do_dead_code(ir, true);
+ do_dead_code_local(ir);
+
} else {
/* Repeat it until it stops making changes. */
bool progress;
shader-db results on RadeonSI:
47109 shaders in 29632 tests
Totals:
SGPRS: 1923308 -> 1923308 (0.00 %)
VGPRS: 1133843 -> 1133839 (-0.00 %)
Spilled SGPRs: 2516 -> 2516 (0.00 %)
Spilled VGPRs: 65 -> 65 (0.00 %)
Private memory VGPRs: 1184 -> 1184 (0.00 %)
Scratch size: 1308 -> 1308 (0.00 %) dwords per thread
Code Size: 60095968 -> 60095840 (-0.00 %) bytes
LDS: 1077 -> 1077 (0.00 %) blocks
Max Waves: 431889 -> 431889 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 4064 -> 4064 (0.00 %)
VGPRS: 2196 -> 2192 (-0.18 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 230836 -> 230708 (-0.06 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 609 -> 609 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
It only helps one or two Metro 2033 shaders.
>
>>
>>>
>>>>
>>>> } else {
>>>> /* Repeat it until it stops making changes. */
>>>> while (do_common_optimization(ir, true, false,
>>>>
>>>> &ctx->Const.ShaderCompilerOptions[stage],
>>>> ctx->Const.NativeIntegers))
>>>> ;
>>>> }
>>>>
>>>> On 03/05/17 00:43, Samuel Pitoiset wrote:
>>>>> The TGSI DCE pass doesn't eliminate dead assignments like
>>>>> MOV TEMP[0], TEMP[1] in presence of loops because it assumes
>>>>> that the visitor doesn't emit dead code. This assumption is
>>>>> actually wrong and this situation happens.
>>>>>
>>>>> However, it appears that the merge_registers() pass accidentally
>>>>> takes care of this for some weird reasons. But since this pass has
>>>>> been disabled for RadeonSI and Nouveau, the renumber_registers()
>>>>> pass which is called *after*, can't do its job correctly.
>>>>>
>>>>> This is because it assumes that no dead code is present. But if
>>>>> there is still a dead assignment, it might re-use the TEMP
>>>>> register id incorrectly and emits wrong code.
>>>>>
>>>>> This patches eliminates all dead assignments by tracking
>>>>> all temporary register reads like what the renumber_registers()
>>>>> actually does. The best solution would be to rewrite that DCE
>>>>> pass entirely but it's more work.
>>>>>
>>>>> This should fix Unigine Heaven on RadeonSI and Nouveau.
>>>>>
>>>>> shader-db results with RadeonSI:
>>>>>
>>>>> 47109 shaders in 29632 tests
>>>>> Totals:
>>>>> SGPRS: 1919548 -> 1919572 (0.00 %)
>>>>> VGPRS: 1139205 -> 1139209 (0.00 %)
>>>>> Spilled SGPRs: 1865 -> 1867 (0.11 %)
>>>>> Spilled VGPRs: 65 -> 65 (0.00 %)
>>>>> Private memory VGPRs: 1184 -> 1184 (0.00 %)
>>>>> Scratch size: 1308 -> 1308 (0.00 %) dwords per thread
>>>>> Code Size: 60083036 -> 60083244 (0.00 %) bytes
>>>>> LDS: 1077 -> 1077 (0.00 %) blocks
>>>>> Max Waves: 431200 -> 431200 (0.00 %)
>>>>> Wait states: 0 -> 0 (0.00 %)
>>>>>
>>>>> It's still interesting to disable the merge_registers() pass.
>>>>>
>>>>> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
>>>>> ---
>>>>> src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 36
>>>>> +++++++++++++++++++++++++-----
>>>>> 1 file changed, 31 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>>> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>>> index 0f8688a41c..01b5a4dc98 100644
>>>>> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>>> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>>> @@ -5085,9 +5085,13 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>>> glsl_to_tgsi_instruction *,
>>>>> this->next_temp * 4);
>>>>> int *write_level = rzalloc_array(mem_ctx, int, this->next_temp
>>>>> * 4);
>>>>> + int *first_reads = rzalloc_array(mem_ctx, int, this->next_temp);
>>>>> int level = 0;
>>>>> int removed = 0;
>>>>> + for (int i = 0; i < this->next_temp; i++)
>>>>> + first_reads[i] = -1;
>>>>> +
>>>>> foreach_in_list(glsl_to_tgsi_instruction, inst,
>>>>> &this->instructions) {
>>>>> assert(inst->dst[0].file != PROGRAM_TEMPORARY
>>>>> || inst->dst[0].index < this->next_temp);
>>>>> @@ -5148,8 +5152,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>>> src_chans |= 1 << GET_SWZ(inst->src[i].swizzle, 3);
>>>>> for (int c = 0; c < 4; c++) {
>>>>> - if (src_chans & (1 << c))
>>>>> + if (src_chans & (1 << c)) {
>>>>> writes[4 * inst->src[i].index + c] = NULL;
>>>>> +
>>>>> + if (first_reads[inst->src[i].index] == -1)
>>>>> + first_reads[inst->src[i].index] = level;
>>>>> + }
>>>>> }
>>>>> }
>>>>> }
>>>>> @@ -5167,8 +5175,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>>> src_chans |= 1 <<
>>>>> GET_SWZ(inst->tex_offsets[i].swizzle, 3);
>>>>> for (int c = 0; c < 4; c++) {
>>>>> - if (src_chans & (1 << c))
>>>>> + if (src_chans & (1 << c)) {
>>>>> writes[4 * inst->tex_offsets[i].index + c] =
>>>>> NULL;
>>>>> +
>>>>> + if (first_reads[inst->tex_offsets[i].index]
>>>>> == -1)
>>>>> + first_reads[inst->tex_offsets[i].index] =
>>>>> level;
>>>>> + }
>>>>> }
>>>>> }
>>>>> }
>>>>> @@ -5211,17 +5223,30 @@
>>>>> glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>>> * the writemask of other instructions with dead channels.
>>>>> */
>>>>> foreach_in_list_safe(glsl_to_tgsi_instruction, inst,
>>>>> &this->instructions) {
>>>>> - if (!inst->dead_mask || !inst->dst[0].writemask)
>>>>> + bool dead_inst = false;
>>>>> +
>>>>> + if (!inst->dst[0].writemask)
>>>>> continue;
>>>>> +
>>>>> /* No amount of dead masks should remove memory stores */
>>>>> if (inst->info->is_store)
>>>>> continue;
>>>>> - if ((inst->dst[0].writemask & ~inst->dead_mask) == 0) {
>>>>> + if (!inst->dead_mask) {
>>>>> + if (inst->dst[0].file == PROGRAM_TEMPORARY &&
>>>>> + first_reads[inst->dst[0].index] == -1) {
>>>>> + dead_inst = true;
>>>>> + }
>>>>> + } else {
>>>>> + if ((inst->dst[0].writemask & ~inst->dead_mask) == 0)
>>>>> + dead_inst = true;
>>>>> + }
>>>>> +
>>>>> + if (dead_inst) {
>>>>> inst->remove();
>>>>> delete inst;
>>>>> removed++;
>>>>> - } else {
>>>>> + } else if (inst->dead_mask) {
>>>>> if (glsl_base_type_is_64bit(inst->dst[0].type)) {
>>>>> if (inst->dead_mask == WRITEMASK_XY ||
>>>>> inst->dead_mask == WRITEMASK_ZW)
>>>>> @@ -5232,6 +5257,7 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
>>>>> }
>>>>> ralloc_free(write_level);
>>>>> + ralloc_free(first_reads);
>>>>> ralloc_free(writes);
>>>>> return removed;
>>>>>
More information about the mesa-dev
mailing list