[Mesa-dev] [PATCH v5 2/6] mesa/st: glsl_to_tgsi: implement new temporary register lifetime tracker
Nicolai Hähnle
nhaehnle at gmail.com
Wed Jun 28 08:37:28 UTC 2017
On 27.06.2017 11:32, Gert Wollny wrote:
[snip]
>>> +enum e_scope_type {
>>
>> Please drop the "e_" prefix here and below, we don't usually do that.
>
> done. Do you also mean below? I usually do this to avoid name clashes
> ...
Yes, I also mean below.
[snip]
>> + case TGSI_OPCODE_CONT: {
>>> + cur_scope->set_continue_line(line);
>>
>> I'm still frankly confused about the way you choose to handle
>> BRK/CONT in loops, and suspect you're doing it wrong. At the very
>> least, having a function called "set_continue_line" be called for a
>> BRK is bad naming.
>
> Well, I'll also changed this. In any case, handling continue like break
> only means that the required lifetime of some temporaries would be
> overestimated, which is not a big problem (as compared to
> underestimating it).
True, but it can also be confusing, so I think it's better to change.
>>> + e_acc_type write_type = inst->op == TGSI_OPCODE_UCMP ?
>>
>> Despite the opcode being called "Integer Conditional Move", it does
>> write to dst unconditionally. It should probably have been called
>> "select" or something like that.
>
> I'm aware of that, the reason why I'm tracking this explicitly is
> because considering this line with TEMP[5] never written before:
>
> UCMP TEMP[5], IN[0], TEMP[5], In[1]
>
> TEMP[5] can be well defined after the write, so I have to take the
> write into account as a dominant write. On the other hand
>
> MOV TEMP[5], TEMP[5]
>
> means that since the read from TEMP[1] is always undefined, the write
> to TEMP[1] also is and I only have to make sure that TEMP[5] is not
> merged with another register that would then be overwritten. Actually,
> I would hope that the dead code elimination removes such statements.
Hmm. I guess I'd have to look more carefully at how you do the lifetime
calculation at the end. I wouldn't be surprised if what you're doing is
correct, but I still think this is one of those instances where you're
making your own life too difficult by not having your concepts clear.
In both examples, you have two accesses: First, a read from the
temporary. Then, an unconditional write to the same temporary.
That is all that should matter.
For a different angle of attack:
AND TEMP[5], IN[0], TEMP[5]
Conceptually, this is *exactly* the same as your UCMP example! Even if
TEMP[5] is undefined before the instruction, it may or may not be
undefined afterwards (because IN[0] can be 0). A similar thing happens
with texturing instructions btw (because textures can be solid colors).
Really, asking whether the values in the temp registers are defined or
not is simply asking the wrong question. You can't really answer that
anyway, because even the inputs may be undefined. The right question to
ask is: Is the temp register guaranteed to have been written previously
in program order (i.e. above the read in the program text). And for that
question, there is simply no distinction between UCMP and everything else.
>>> + last_write = line;
>>> +
>>> + /* If no first write is assigned check whether we deal with a
>>> case where the temp is read and written in the same instructions,
>>> because then it is not a dominant write, it may even be undefined.
>>> Hence postpone the assignment if the first write, only mark that
>>> the register was written at all by remembering a scope */
>>> +
>>
>
>> Also, I think the comment is wrong. It should count as a dominating
>> write even if there's a read on the same line. So the special
>> handling here is wrong.
> This is exactly the case that I commented above, i.e. in the cases when
> it is undefined behavior should I keep the register alive? I opted for
> no.
So my comment above applies as well in turn :)
BTW, as an additional thought: When I started using "dominating" here, I
meant it in a sense that is derived from dominators in control flow
graphs. Write A dominates read B if every path through the program that
reaches B must go through A. Whether the value written by A is defined
or not is irrelevant.
>>> + if (first_dominant_write < 0) {
>>> +
>>> + if (line != last_read || (rw ==
>>> acc_write_cond_from_self))
>>> + first_dominant_write = line;
>>> +
>>> + first_write_scope = scope;
>>
>> Should this be renamed to first_dominant_write_scope?
> okay.
>
>>> + }
>>> +
>>> + if (scope->is_conditional() && scope->in_loop())
>>> + keep_for_full_loop = true;
>>
>> This is only necessary as long as we don't have a dominant write yet,
>> right?
>
> I have a test case for this, if you have the dominant write in a loop
> within conditional within loop then the propagation must be done when
> moving up the scopes.
Maybe you're referring to this example:
BGNLOOP
IF ...
BGNLOOP
MOV TEMP[0], ...
ENDLOOP
ENDIF
ENDLOOP
In this case, the lifetime must span the entire outer loop.
However, consider this example, where you have an earlier dominating write:
BGNLOOP
MOV TEMP[0], ...
IF ...
BGNLOOP
MOV TEMP[0], ...
ENDLOOP
ENDIF
ENDLOOP
Here, the lifetime must span only from the first MOV to the last use
inside the outer loop.
I know, you can always argue that a conservative approximation of
lifetimes is okay, and I mostly agree.
But what *isn't* okay IMNSHO is getting a conservative approximation
because of code that looks like there was confusion about *why* the code
is doing what it does.
The above is a perfect example of that. There is a check for whether
there has been a dominating write *right there*, and setting the
keep_for_full_loop is only necessary if there hasn't been a dominating
write before, so why is that not inside the dominating write check? It
just reeks of coding by trial-and-error, and that's frankly not good enough.
>>> +#include <iostream>
>>> +using std::cerr;
>>
>> Always put includes and usings at the top of a source file.
> Sorry, that was supposed to be just a quick check that I wanted to
> remove later, but forgot to do so.
>
>
>
>>> +
>>> +
>>> + /* Undefined behaviour: read and write in the same instruction
>>> + * but never written elsewhere. Since it is written, we need to
>>> + * keep it nevertheless.
>>
>> This doesn't actually need to be undefined behavior, depending on
>> the instruction. It's likely to be dead code though.
>
> CMIIW, but with the exception of UCMP I don't see a case where this is
> not undefined behavior, and UCMP is handled differently.
Well, there *are* other examples, like the AND I mentioned above, or
dually an OR, or UMUL/IMUL, or USLT and friends. Basically, like I've
argued before, UCMP *shouldn't* be handled differently.
>> Also, the actual code below doesn't reflect the comment.
> I'll try to improve the comment.
>
>> Don't you have to expand this to the extend of the outermost loop?
> Why? if it is undefined I don't need to keep the register around for
> more that the instructions where it is written (hence the last_write
> +1).
>
>
>>> + /* Evaluate the scope that is shared by all three, first write,
>>> and
>>> + * first (conditional) read before write and last read. */
>>
>> What's a conditional read, and why does it matter?
>
> An unconditional read before a first dominant write is undefined. A
> conditional read before the dominant write (in a loop) is very likely
> well defined and for that reason we have to take it into account. e.g.
>
> Y=0
> I=0
> BGNLOOP
> IF i > 1
> ADD Y, X, Y
> ENDIF
> X = 1
> I = I + 1
> IF I > 5
> BRK
> ENDIF
> ENDLOOP
> OUT = Y
>
> Here access to X is always well defined. With
>
> Y=0
> I=0
> BGNLOOP
> ADD Y, X, Y
> X = 1
> I = I +1
> IF I > 5
> BRK
> ENDIF
> ENDLOOP
> OUT = Y
>
> The read access to X in line 4 is undefined in the first round,
> and hence Y and OUT are undefined. Hence, there is no point in keeping
> X alive for the full loop like in the above case.
>
> If, on the other hand we have
>
> Y=0
> I=0
> BGNLOOP
> UCMP Y, I, IN0, X
> X = 1
> I = I +1
> IF I > 5
> BRK
> ENDIF
> ENDLOOP
> OUT = Y
>
> then X must be kept alive for the whole loop.
Yeah, since there are more examples like this than UCMP, I think this
again falls under the category of making your own life unnecessarily
difficult.
BTW, in some sense perhaps you could think of what you're trying to do
here as taking a very aggressive stance on undefs like modern C/C++
compilers are doing. An argument can be made in favor of that, but as I
hope I've shown you, it's a minefield. Should (x & 0) be undefined if x
is undefined? Perhaps there are some languages that would like that and
would benefit from it, but I'd rather play it safe here and assume that
examples like what you've shown can be meaningful for any kind of
instruction (whether UCMP or otherwise).
If, at some later point, you could show that there are big benefits to
be had from making such undef-based optimizations, then we could
reconsider. But I doubt it, and in the meantime trying to do this just
muddles the concepts that this lifetime estimation is dealing with.
Also, just a reminder of the issue of writemasks and tracking components
separately.
Cheers,
Nicolai
>>> +
>>> + /* Here we are at the same scope, all is resolved */
>>> + return make_lifetime(first_dominant_write, last_read);
>>
>> I suspect that there are a lot of logical cleanups and
>> simplifications that you can achieve in this function but sticking to
>> a straight story of what every variable really means.
>>
>> But please, first address the issue of multiple components and all
>> the style issues, then we can see what to about this.
>
> Okay. Although I think that there are not many simplifications possible
> that don't sacrifice a close-to-minimal estimated life time.
>
> Best,
> Gert
>
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
More information about the mesa-dev
mailing list