[Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

Nicolai Hähnle nhaehnle at gmail.com
Mon Jun 12 19:00:38 UTC 2017


On 12.06.2017 20:52, Nicolai Hähnle wrote:
> On 12.06.2017 14:34, Gert Wollny wrote:
>> Am Montag, den 12.06.2017, 12:28 +0200 schrieb Nicolai Hähnle:
>>> On 10.06.2017 01:15, Gert Wollny wrote:
>>>>
>>> Plenty of style issues aside, can you explain where and why you get
>>> tighter lifetimes?
>>
>> In the original code if a temporary is used within a loop it gets the
>> whole life time of the loop assigned.
>>
>> With this patch I track in more detail where a temporary is accesses
>> and base the lifetime on this: For instance, if a variable is first
>> unconditionally written and later read for the last time in the same
>> scope (loop, if, or switch branch), then the lifetime can be restricted
>> to that first-written - last-read range.
>>
>> The code gets more complex because it tries to resolve this also for
>> nested scopes, and one also has to take care about whether a variable
>> is written only conditionally within a loop, or conditionally read
>> before it is written (in the source code sense, but not in the program
>> flow sense).
>>
>> Shaders that profit from this better lifetime estimation are the ones
>> that have many short living values within long loops, think
>>
>> for () {
>>    float4 t[1] = f(in2);
>>    float4 t[2] = g(in1);
>>    float4 t[3] = op(t[1], t[2]);
>>     ...
>>    sum += t[200];
>> }
>> Here the old code would keep all of the t[i] alive for the whole loop,
>> and in fact with the GpuTest Piano benchmark I have seen a shader with
>>> 2000 temporaries where many were used in a loop but only required for
>> two or three lines so that my code could merge them to less then 100
>> temporaries, and this made it possible for the tgsi to bytecode layer
>> in r600g to actually translate the shader.
> 
> Okay. I think you should seriously re-think your algorithm in a way that 
> makes it a more natural evolution from the algorithm that's already there.
> 
> Basically, the current algorithm tracks (first_write, last_read), so 
> think about what you need to track in order to obtain a single-pass 
> algorithm that computes lifetime (first, last) for every temporary. I 
> think the following should do it for a first cut:
> 
>     struct st_calculate_lifetime_state {
>        /* First and last instruction that has this temporary */
>        unsigned first;
>        unsigned last;
> 
>        /* First instruction of the outer-most in-active loop that
>         * contains this temporary. (A loop is active if we're
>         * currently processing instructions in it.)
>        unsigned loop_first;
> 
>        /* Position of a read without preceding dominating write. */
>        unsigned undef_read[4];
> 
>        /* First write in the program that is dominating our
>         * current position, per channel.
>         */
>        unsigned first_dominating_write[4];
>     };
> 
> In addition, you need to keep a stack of active scopes (loops and ifs), 
> but you really only need to remember the start of the scope (and for 
> loops, probably the position of the first BREAK).
> 
> Here's a sketch of the "state machine" that you need to run while 
> traversing the program, assuming no BRK and CONT:
> 
> Init: last = 0, everything else = ~0
> 
> These are updates on individual variables on use:
> 
> On any use (source or dest):
> - if first > cur_pos, set first = loop_first = cur_pos
> - if loop_first < first, set first = loop_first
> - update last
> 
> On use as source:
> - if first_dominating_write > cur_pos and undef_read > cur_pos, set 
> undef_read = cur_pos
> 
> On use as dest:
> - if first_dominating_write > cur_pos, set first_dominating_write = cur_pos
> 
> These are updates of all temporaries on scope change:
> 
> On ENDLOOP, for all temps:
> - if loop_first > start of loop, set loop_first = start of loop
> - first < start of loop, update last to end of loop

This should of course read:

- if first < start of loop and last is inside the loop, then update last 
to the end of the loop

Cheers,
Nicolai


> - if undef_read between start and end of loop: set first = MIN(first, 
> start of loop) and last = end of loop
> - if first_dominating_write < end of loop, set undef_read = ~0
> 
> On ELSE and ENDIF, for all temps:
> - if first_dominating_write > start of scope, set first_dominating_write 
> = ~0
> 
> I'm not sure right now whether BREAK / CONT need special treatment at 
> all. I think what you need is:
> 
> On ENDLOOP:
> - if first_dominating_write is between the first BREAK in the loop and 
> the end of the loop, set first_dominating_write = ~0
> 
> And for CONT, you probably don't really need anything, because CONTs 
> cannot make you skip code forever.
> 
> What this state machine doesn't yet cover is
> 
> IF ..
>    MOV TEMP[0], ...
> ELSE
>    MOV TEMP[0], ...
> ENDIF
> 
> Still, I'd start with it and see whether you need to cover that case.
> 
> And even that case can probably be dealt with in a fairly efficient and 
> pragmatic way. The idea is to keep track of the nesting level of IFs, 
> plus an
> 
>     uint32 dominating_write_in_true_block[4];
> 
> per temp. Then:
> 
> On ELSE:
> - if first_dominating_write < cur_pos, set the bit corresponding to the 
> current nesting level in dominating_write_in_true_block
> 
> On ENDIF:
> - don't reset first_dominating_write if the bit corresponding to the 
> current nesting level in dominating_write_in_true_block is set
> - unconditionally clear that bit
> 
> It won't cover cases with nesting level > 32, but do we really care?
> 
> I hope I didn't miss anything, because after all this is admittedly 
> subtle stuff. Still, I think this kind of state-machine approach should 
> work and allow you to avoid *lots* of allocations and pointer-chasing.
> 
> Cheers,
> Nicolai
> 
> 
>>
>> Best,
>> Gert
>>
>> PS: Regarding style, I am fully aware that I have to iterate over this
>> code a few more times. I tried to adhere to the way the existing code
>> represents itself to me, but I'm happy to listen to any advise I can
>> get.
>>
>> In any case, I though it might be best to send this patch out now for
>> discussion. Now, with the unit tests in place I will rework it and
>> focus also more on style questions. One thing that comes up immediately
>> up is that I will try to reduce the use of dynamically allocated
>> memory, since 60% of the run time of my code is with memory allocation
>> and de-allocation.
>>
>>
> 
> 


-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.


More information about the mesa-dev mailing list