[Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation
Nicolai Hähnle
nhaehnle at gmail.com
Mon Jun 12 19:00:38 UTC 2017
On 12.06.2017 20:52, Nicolai Hähnle wrote:
> On 12.06.2017 14:34, Gert Wollny wrote:
>> Am Montag, den 12.06.2017, 12:28 +0200 schrieb Nicolai Hähnle:
>>> On 10.06.2017 01:15, Gert Wollny wrote:
>>>>
>>> Plenty of style issues aside, can you explain where and why you get
>>> tighter lifetimes?
>>
>> In the original code if a temporary is used within a loop it gets the
>> whole life time of the loop assigned.
>>
>> With this patch I track in more detail where a temporary is accesses
>> and base the lifetime on this: For instance, if a variable is first
>> unconditionally written and later read for the last time in the same
>> scope (loop, if, or switch branch), then the lifetime can be restricted
>> to that first-written - last-read range.
>>
>> The code gets more complex because it tries to resolve this also for
>> nested scopes, and one also has to take care about whether a variable
>> is written only conditionally within a loop, or conditionally read
>> before it is written (in the source code sense, but not in the program
>> flow sense).
>>
>> Shaders that profit from this better lifetime estimation are the ones
>> that have many short living values within long loops, think
>>
>> for () {
>> float4 t[1] = f(in2);
>> float4 t[2] = g(in1);
>> float4 t[3] = op(t[1], t[2]);
>> ...
>> sum += t[200];
>> }
>> Here the old code would keep all of the t[i] alive for the whole loop,
>> and in fact with the GpuTest Piano benchmark I have seen a shader with
>>> 2000 temporaries where many were used in a loop but only required for
>> two or three lines so that my code could merge them to less then 100
>> temporaries, and this made it possible for the tgsi to bytecode layer
>> in r600g to actually translate the shader.
>
> Okay. I think you should seriously re-think your algorithm in a way that
> makes it a more natural evolution from the algorithm that's already there.
>
> Basically, the current algorithm tracks (first_write, last_read), so
> think about what you need to track in order to obtain a single-pass
> algorithm that computes lifetime (first, last) for every temporary. I
> think the following should do it for a first cut:
>
> struct st_calculate_lifetime_state {
> /* First and last instruction that has this temporary */
> unsigned first;
> unsigned last;
>
> /* First instruction of the outer-most in-active loop that
> * contains this temporary. (A loop is active if we're
> * currently processing instructions in it.)
> unsigned loop_first;
>
> /* Position of a read without preceding dominating write. */
> unsigned undef_read[4];
>
> /* First write in the program that is dominating our
> * current position, per channel.
> */
> unsigned first_dominating_write[4];
> };
>
> In addition, you need to keep a stack of active scopes (loops and ifs),
> but you really only need to remember the start of the scope (and for
> loops, probably the position of the first BREAK).
>
> Here's a sketch of the "state machine" that you need to run while
> traversing the program, assuming no BRK and CONT:
>
> Init: last = 0, everything else = ~0
>
> These are updates on individual variables on use:
>
> On any use (source or dest):
> - if first > cur_pos, set first = loop_first = cur_pos
> - if loop_first < first, set first = loop_first
> - update last
>
> On use as source:
> - if first_dominating_write > cur_pos and undef_read > cur_pos, set
> undef_read = cur_pos
>
> On use as dest:
> - if first_dominating_write > cur_pos, set first_dominating_write = cur_pos
>
> These are updates of all temporaries on scope change:
>
> On ENDLOOP, for all temps:
> - if loop_first > start of loop, set loop_first = start of loop
> - first < start of loop, update last to end of loop
This should of course read:
- if first < start of loop and last is inside the loop, then update last
to the end of the loop
Cheers,
Nicolai
> - if undef_read between start and end of loop: set first = MIN(first,
> start of loop) and last = end of loop
> - if first_dominating_write < end of loop, set undef_read = ~0
>
> On ELSE and ENDIF, for all temps:
> - if first_dominating_write > start of scope, set first_dominating_write
> = ~0
>
> I'm not sure right now whether BREAK / CONT need special treatment at
> all. I think what you need is:
>
> On ENDLOOP:
> - if first_dominating_write is between the first BREAK in the loop and
> the end of the loop, set first_dominating_write = ~0
>
> And for CONT, you probably don't really need anything, because CONTs
> cannot make you skip code forever.
>
> What this state machine doesn't yet cover is
>
> IF ..
> MOV TEMP[0], ...
> ELSE
> MOV TEMP[0], ...
> ENDIF
>
> Still, I'd start with it and see whether you need to cover that case.
>
> And even that case can probably be dealt with in a fairly efficient and
> pragmatic way. The idea is to keep track of the nesting level of IFs,
> plus an
>
> uint32 dominating_write_in_true_block[4];
>
> per temp. Then:
>
> On ELSE:
> - if first_dominating_write < cur_pos, set the bit corresponding to the
> current nesting level in dominating_write_in_true_block
>
> On ENDIF:
> - don't reset first_dominating_write if the bit corresponding to the
> current nesting level in dominating_write_in_true_block is set
> - unconditionally clear that bit
>
> It won't cover cases with nesting level > 32, but do we really care?
>
> I hope I didn't miss anything, because after all this is admittedly
> subtle stuff. Still, I think this kind of state-machine approach should
> work and allow you to avoid *lots* of allocations and pointer-chasing.
>
> Cheers,
> Nicolai
>
>
>>
>> Best,
>> Gert
>>
>> PS: Regarding style, I am fully aware that I have to iterate over this
>> code a few more times. I tried to adhere to the way the existing code
>> represents itself to me, but I'm happy to listen to any advise I can
>> get.
>>
>> In any case, I though it might be best to send this patch out now for
>> discussion. Now, with the unit tests in place I will rework it and
>> focus also more on style questions. One thing that comes up immediately
>> up is that I will try to reduce the use of dynamically allocated
>> memory, since 60% of the run time of my code is with memory allocation
>> and de-allocation.
>>
>>
>
>
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
More information about the mesa-dev
mailing list