[Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

Tue Jun 13 09:07:47 UTC 2017

On 13.06.2017 10:01, Gert Wollny wrote:
> Am Montag, den 12.06.2017, 21:00 +0200 schrieb Nicolai Hähnle:
> 
> Thanks for you comments, although I do not agree with most of them.
> 
>>
>>> Okay. I think you should seriously re-think your algorithm in a way
>>> that  makes it a more natural evolution from the algorithm that's
>>> already there.
> 
> Well, when I look at the current algorithm than I don't really see much
> to evolve from: The tracking of loops is minimal and it actually only
> uses the outer loop to assign life times.
> In any case, 80% of this code I already re-used (i.e. the loop over the
> instructions and iterating over the registers).
> 
> 
>>> Basically, the current algorithm tracks (first_write, last_read),
>>> so  think about what you need to track in order to obtain a single-
>>> pass  algorithm that computes lifetime (first, last) for every
>>> temporary. I
> 
> IMHO a single pass algorithm is not better then what I do now.
> Especially with nested loops a single pass algorithm will be more
> complicated.
> 
> Think e.g.
> 
> ...
> BEGIN_LOOP
>    ...
>    BEGIN_LOOP
>       a = ...
>       b = ...
>       c = a OP b;
>    END_LOOP
>    ... /* more processing that doesn't access a, b, or c */
> END_LOOP
> 
> out =   f(a, c, ...)
> END
> 
> 
> Here, a and c must be kept alive for both loops, but b only is
> needed for a few instructions. However, in a single pass state tracker
> I have to keep all the information for a, b, and c until the end of the
> program, because only then I can discard the loop information for b,
> and resolve everything else for a and c.

In terms of memory use, your approach also keeps the information for all 
variables all the time, because of the multi-pass.

In terms of running time, it's true that what I sketched will loop over 
all temporaries for every end-of-scope.

However, that's fairly simple to fix by keeping track of which 
temporaries occurred per-scope, and then only looping over those 
temporaries.

You could even do the tracking with a single stack-like array, so you 
end up with only 3 memory allocations:

- stack of currently active scopes
- array of temporaries
- stack of temporary-occurences per scope

> With the approach I am using, i.e. only collecting all the access
> information in the pass over the program, and resolving the life-times
> at the end by re-playing the temp-access timeline, the above can be
> achieved with less hassle, because I don't need to track per temporary
> information for all the temporaries all the time, instead, I only need
> to resolve loops and scopes when it is really needed.
> 
> [snip]
> 
> To me the algorithm you lined out looks quite complicated, but not too
> different from what I'm doing when I replay the access time-line of a
> register. However, with your approach one has to track the state of
> each variable all the time, information that could be shared otherwise
> and might not even needed.
> 
> [snip]
> 
>>>
>>> I hope I didn't miss anything, because after all this is
>>> admittedly subtle stuff.
> It is, and this is why I think it is better to separate the steps into
> manageable chunks that can be put under test. (I admit, I'm a fan of
> test driven development, and for that I also think it is more important
> to write test cases instead of sketching out algorithms).
> 
>>> Still, I think this kind of state-machine approach should
>>> work and allow you to avoid *lots* of allocations and pointer-
>>> chasing.
> The allocations I can (and will) get rid of, but I don't see some
> pointer-chasing as a problem, since it is encapsulated within class
> methods.

How will you get rid of those allocations? I find that it's useful to be 
able to sketch the data structures first.

At the very least, you need a vector per temporary. You're also not 
handling the case where a temporary is set in both branches of an 
if/else, and I'd argue that the approach is more annoying to adapt to 
handling it than what I sketched.

By the way, that's related to another conceptual advantage of having a 
state machine that acts on end-of-scope, which is that this is basically 
like having an action for phi-nodes.

> I thank you for your comments, but given that my code is working I
> don't see that re-doing it from scratch is such a good idea. I think
> refactoring it to eliminate the allocations and covering additional
> test cases is a better approach. If this makes it possible to move the
> implementation to be single pass, then I might consider it, but I think
> tracking all the information for all temporaries all the time is not
> such a good idea, especially for large shaders that might have 2000+
> temporaries before register merging.

Well, as I said, you're tracking all the information anyway, and as for 
when you *update* the information, there's an easy way to make that lean.

I'm curious what you'd suggest for getting rid of allocations anyway.

Cheers,
Nicolai
-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.