<div dir="auto">It would kill nv30, I believe.</div><div class="gmail_extra"><br><div class="gmail_quote">On Apr 24, 2017 10:30 PM, "Roland Scheidegger" <<a href="mailto:sroland@vmware.com">sroland@vmware.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Am 24.04.2017 um 23:12 schrieb Rob Clark:<br>
> so I guess this is likely to hurt pipe drivers that don't (yet?)<br>
> have a real compiler backend. (Ie. etnaviv and freedreno/a2xx.) So<br>
> maybe it should be optional.<br>
I suppose softpipe, too? Though that's fine, noone cares if it gets a<br>
bit slower. Might even be nicer for debugging purposes...<br>
<br>
Roland<br>
<br>
<br>
<br>
> Also I wonder about the pre-llvm radeon gen's, since sb uses the<br>
> actual instruction encoding for IR between tgsi->sb and backend opt<br>
> passes.. iirc they have had problems when the tgsi code uses too<br>
> many registers.<br>
><br>
> BR, -R<br>
><br>
> On Mon, Apr 24, 2017 at 5:01 PM, Samuel Pitoiset<br>
> <<a href="mailto:samuel.pitoiset@gmail.com">samuel.pitoiset@gmail.com</a>> wrote:<br>
>> The main goal of this pass to merge temporary registers in order to<br>
>> reduce the total number of registers and also to produce optimal<br>
>> TGSI code.<br>
>><br>
>> In fact, compilers seem to be confused when temporary variables are<br>
>> already merged, maybe because it's done too early in the process.<br>
>><br>
>> Removing the pass, reduce both the register pressure and the code<br>
>> size (TGSI is no longer optimized, but who cares?). shader-db<br>
>> results with RadeonSI and Nouveau are interesting.<br>
>><br>
>> Nouveau:<br>
>><br>
>> total instructions in shared programs : 3931608 -> 3929463<br>
>> (-0.05%) total gprs used in shared programs : 481255 -> 479014<br>
>> (-0.47%) total local used in shared programs : 27481 -> 27381<br>
>> (-0.36%) total bytes used in shared programs : 36031256 -><br>
>> 36011120 (-0.06%)<br>
>><br>
>> local gpr inst bytes helped 14<br>
>> 1471 1309 1309 hurt 1 88<br>
>> 384 384<br>
>><br>
>> RadeonSI:<br>
>><br>
>> PERCENTAGE DELTAS Shaders SGPRs VGPRs SpillSGPR<br>
>> SpillVGPR PrivVGPR Scratch CodeSize MaxWaves Waits<br>
>> ------------------------------<wbr>------------------------------<wbr>------------------------------<wbr>----------------------------<br>
>><br>
>><br>
All affected 4906 -0.31 % -0.40 % -2.93 % -20.00 %<br>
. -20.00 % -0.18 % 0.19 % .<br>
>> ------------------------------<wbr>------------------------------<wbr>------------------------------<wbr>----------------------------<br>
>><br>
>><br>
Total 47109 -0.04 % -0.05 % -1.97 % -7.14 %<br>
. -0.30 % -0.03 % 0.02 % .<br>
>><br>
>> Found by luck while fixing an issue in the TGSI dead code<br>
>> elimination pass which affects tex instructions with bindless<br>
>> samplers.<br>
>><br>
>> Signed-off-by: Samuel Pitoiset <<a href="mailto:samuel.pitoiset@gmail.com">samuel.pitoiset@gmail.com</a>> ---<br>
>> src/mesa/state_tracker/st_<wbr>glsl_to_tgsi.cpp | 62<br>
>> ------------------------------ 1 file changed, 62 deletions(-)<br>
>><br>
>> diff --git a/src/mesa/state_tracker/st_<wbr>glsl_to_tgsi.cpp<br>
>> b/src/mesa/state_tracker/st_<wbr>glsl_to_tgsi.cpp index<br>
>> de7fe7837a..d033bdcc5a 100644 ---<br>
>> a/src/mesa/state_tracker/st_<wbr>glsl_to_tgsi.cpp +++<br>
>> b/src/mesa/state_tracker/st_<wbr>glsl_to_tgsi.cpp @@ -565,7 +565,6 @@<br>
>> public: int eliminate_dead_code(void);<br>
>><br>
>> void merge_two_dsts(void); - void merge_registers(void); void<br>
>> renumber_registers(void);<br>
>><br>
>> void emit_block_mov(ir_assignment *ir, const struct glsl_type<br>
>> *type, @@ -5262,66 +5261,6 @@<br>
>> glsl_to_tgsi_visitor::merge_<wbr>two_dsts(void) } }<br>
>><br>
>> -/* Merges temporary registers together where possible to reduce<br>
>> the number of - * registers needed to run a program. - * - *<br>
>> Produces optimal code only after copy propagation and dead code<br>
>> elimination - * have been run. */ -void<br>
>> -glsl_to_tgsi_visitor::merge_<wbr>registers(void) -{ - int *last_reads<br>
>> = rzalloc_array(mem_ctx, int, this->next_temp); - int<br>
>> *first_writes = rzalloc_array(mem_ctx, int, this->next_temp); -<br>
>> struct rename_reg_pair *renames = rzalloc_array(mem_ctx, struct<br>
>> rename_reg_pair, this->next_temp); - int i, j; - int<br>
>> num_renames = 0; - - /* Read the indices of the last read and<br>
>> first write to each temp register - * into an array so that we<br>
>> don't have to traverse the instruction list as - * much. */ -<br>
>> for (i = 0; i < this->next_temp; i++) { - last_reads[i] = -1;<br>
>> - first_writes[i] = -1; - } -<br>
>> get_last_temp_read_first_temp_<wbr>write(last_reads, first_writes); - -<br>
>> /* Start looking for registers with non-overlapping usages that can<br>
>> be - * merged together. */ - for (i = 0; i < this->next_temp;<br>
>> i++) { - /* Don't touch unused registers. */ - if<br>
>> (last_reads[i] < 0 || first_writes[i] < 0) continue; - - for<br>
>> (j = 0; j < this->next_temp; j++) { - /* Don't touch unused<br>
>> registers. */ - if (last_reads[j] < 0 || first_writes[j] <<br>
>> 0) continue; - - /* We can merge the two registers if the<br>
>> first write to j is after or - * in the same instruction<br>
>> as the last read from i. Note that the - * register at<br>
>> index i will always be used earlier or at the same time -<br>
>> * as the register at index j. */ - if (first_writes[i] <=<br>
>> first_writes[j] && - last_reads[i] <= first_writes[j])<br>
>> { - renames[num_renames].old_reg = j; -<br>
>> renames[num_renames].new_reg = i; - num_renames++; - -<br>
>> /* Update the first_writes and last_reads arrays with the new -<br>
>> * values for the merged register index, and mark the newly unused -<br>
>> * register index as such. */ - assert(last_reads[j] >=<br>
>> last_reads[i]); - last_reads[i] = last_reads[j]; -<br>
>> first_writes[j] = -1; - last_reads[j] = -1; - }<br>
>> - } - } - - rename_temp_registers(num_<wbr>renames, renames); -<br>
>> ralloc_free(renames); - ralloc_free(last_reads); -<br>
>> ralloc_free(first_writes); -} - /* Reassign indices to temporary<br>
>> registers by reusing unused indices created * by optimization<br>
>> passes. */ void @@ -6712,7 +6651,6 @@ get_mesa_program_tgsi(struct<br>
>> gl_context *ctx, while (v->eliminate_dead_code());<br>
>><br>
>> v->merge_two_dsts(); - v->merge_registers();<br>
>> v->renumber_registers();<br>
>><br>
>> /* Write the END instruction. */ -- 2.12.2<br>
>><br>
>> ______________________________<wbr>_________________ mesa-dev mailing<br>
>> list <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
>> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
> ______________________________<wbr>_________________ mesa-dev mailing<br>
> list <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
><br>
<br>
______________________________<wbr>_________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</blockquote></div></div>