[Mesa-dev] [PATCH v5 2/6] mesa/st: glsl_to_tgsi: implement new temporary register lifetime tracker

Wed Jun 28 08:37:28 UTC 2017

On 27.06.2017 11:32, Gert Wollny wrote:
[snip]
>>> +enum e_scope_type {
>>
>> Please drop the "e_" prefix here and below, we don't usually do that.
> 
> done. Do you also mean below? I usually do this to avoid name clashes
> ...

Yes, I also mean below.

[snip]
>> +      case TGSI_OPCODE_CONT: {
>>> +         cur_scope->set_continue_line(line);
>>
>> I'm still frankly confused about the way you choose to handle
>> BRK/CONT in loops, and suspect you're doing it wrong. At the very
>> least, having a function called "set_continue_line" be called for a
>> BRK is bad naming.
> 
> Well, I'll also changed this. In any case, handling continue like break
> only means that the required lifetime of some temporaries would be
> overestimated, which is not a big problem (as compared to
> underestimating it).

True, but it can also be confusing, so I think it's better to change.

>>> +         e_acc_type write_type = inst->op == TGSI_OPCODE_UCMP ?
>>
>> Despite the opcode being called "Integer Conditional Move", it does
>> write to dst unconditionally. It should probably have been called
>> "select" or something like that.
> 
> I'm aware of that, the reason why I'm tracking this explicitly is
> because considering this line with TEMP[5] never written before:
> 
>    UCMP TEMP[5], IN[0], TEMP[5], In[1]
> 
> TEMP[5] can be well defined after the write, so I have to take the
> write into account as a dominant write. On the other hand
> 
>   MOV TEMP[5], TEMP[5]
> 
> means that since the read from TEMP[1] is always undefined, the write
> to TEMP[1] also is and I only have to make sure that TEMP[5] is not
> merged with another register that would then be overwritten. Actually,
> I would hope that the dead code elimination removes such statements.

Hmm. I guess I'd have to look more carefully at how you do the lifetime 
calculation at the end. I wouldn't be surprised if what you're doing is 
correct, but I still think this is one of those instances where you're 
making your own life too difficult by not having your concepts clear.

In both examples, you have two accesses: First, a read from the 
temporary. Then, an unconditional write to the same temporary.

That is all that should matter.

For a different angle of attack:

   AND TEMP[5], IN[0], TEMP[5]

Conceptually, this is *exactly* the same as your UCMP example! Even if 
TEMP[5] is undefined before the instruction, it may or may not be 
undefined afterwards (because IN[0] can be 0). A similar thing happens 
with texturing instructions btw (because textures can be solid colors).

Really, asking whether the values in the temp registers are defined or 
not is simply asking the wrong question. You can't really answer that 
anyway, because even the inputs may be undefined. The right question to 
ask is: Is the temp register guaranteed to have been written previously 
in program order (i.e. above the read in the program text). And for that 
question, there is simply no distinction between UCMP and everything else.

>>> +      last_write = line;
>>> +
>>> + /* If no first write is assigned check whether we deal with a
>>> case where the temp is read and written in the same instructions,
>>> because then it is not a dominant write, it may even be undefined.
>>> Hence postpone the assignment if the first write, only mark that
>>> the register was  written at all by remembering a scope */
>>> +
>>
> 
>> Also, I think the comment is wrong. It should count as a dominating
>> write even if there's a read on the same line. So the special
>> handling here is wrong.
> This is exactly the case that I commented above, i.e. in the cases when
> it is undefined behavior should I keep the register alive? I opted for
> no.

So my comment above applies as well in turn :)

BTW, as an additional thought: When I started using "dominating" here, I 
meant it in a sense that is derived from dominators in control flow 
graphs. Write A dominates read B if every path through the program that 
reaches B must go through A. Whether the value written by A is defined 
or not is irrelevant.

>>> +      if (first_dominant_write < 0) {
>>> +
>>> +          if (line != last_read || (rw ==
>>> acc_write_cond_from_self))
>>> +             first_dominant_write = line;
>>> +
>>> +          first_write_scope = scope;
>>
>> Should this be renamed to first_dominant_write_scope?
> okay.
> 
>>> +      }
>>> +
>>> +      if (scope->is_conditional() && scope->in_loop())
>>> +         keep_for_full_loop = true;
>>
>> This is only necessary as long as we don't have a dominant write yet,
>> right?
> 
> I have a test case for this, if you have the dominant write in a loop
> within conditional within loop then the propagation must be done when
> moving up the scopes.

Maybe you're referring to this example:

BGNLOOP
   IF ...
     BGNLOOP
       MOV TEMP[0], ...
     ENDLOOP
   ENDIF
ENDLOOP

In this case, the lifetime must span the entire outer loop.

However, consider this example, where you have an earlier dominating write:

BGNLOOP
   MOV TEMP[0], ...
   IF ...
     BGNLOOP
       MOV TEMP[0], ...
     ENDLOOP
   ENDIF
ENDLOOP

Here, the lifetime must span only from the first MOV to the last use 
inside the outer loop.

I know, you can always argue that a conservative approximation of 
lifetimes is okay, and I mostly agree.

But what *isn't* okay IMNSHO is getting a conservative approximation 
because of code that looks like there was confusion about *why* the code 
is doing what it does.

The above is a perfect example of that. There is a check for whether 
there has been a dominating write *right there*, and setting the 
keep_for_full_loop is only necessary if there hasn't been a dominating 
write before, so why is that not inside the dominating write check? It 
just reeks of coding by trial-and-error, and that's frankly not good enough.

>>> +#include <iostream>
>>> +using std::cerr;
>>
>> Always put includes and usings at the top of a source file.
> Sorry, that was supposed to be just a quick check that I wanted to
> remove later, but forgot to do so.
> 
> 
> 
>>> +
>>> +
>>> +   /* Undefined behaviour: read and write in the same instruction
>>> +    * but never written elsewhere. Since it is written, we need to
>>> +    * keep it nevertheless.
>>
>> This doesn't actually need to be undefined behavior, depending on
>> the  instruction. It's likely to be dead code though.
> 
> CMIIW, but with the exception of UCMP I don't see a case where this is
> not undefined behavior, and UCMP is handled differently.

Well, there *are* other examples, like the AND I mentioned above, or 
dually an OR, or UMUL/IMUL, or USLT and friends. Basically, like I've 
argued before, UCMP *shouldn't* be handled differently.

>> Also, the actual code below doesn't reflect the comment.
> I'll try to improve the comment.
> 
>> Don't you have to expand this to the extend of the outermost loop?
> Why? if it is undefined I don't need to keep the register around for
> more that the instructions where it is written (hence the last_write
> +1).
> 
> 
>>> +   /* Evaluate the scope that is shared by all three, first write,
>>> and
>>> +    * first (conditional) read before write and last read. */
>>
>> What's a conditional read, and why does it matter?
> 
> An unconditional read before a first dominant write is undefined. A
> conditional read before the dominant write (in a loop) is very likely
> well defined and for that reason we have to take it into account. e.g.
> 
> Y=0
> I=0
> BGNLOOP
>    IF i > 1
>      ADD Y, X, Y
>    ENDIF
>    X = 1
>    I = I + 1
>    IF I > 5
>      BRK
>    ENDIF
> ENDLOOP
> OUT = Y
> 
> Here access to X is always well defined. With
> 
> Y=0
> I=0
> BGNLOOP
>    ADD Y, X, Y
>    X = 1
>    I = I +1
>    IF I > 5
>      BRK
>    ENDIF
> ENDLOOP
> OUT = Y
> 
> The read access to X in line 4 is undefined in the first round,
> and hence Y and OUT are undefined. Hence, there is no point in keeping
> X alive for the full loop like in the above case.
> 
> If, on the other hand we have
> 
> Y=0
> I=0
> BGNLOOP
>    UCMP Y, I, IN0, X
>    X = 1
>    I = I +1
>    IF I > 5
>      BRK
>   ENDIF
> ENDLOOP
> OUT = Y
> 
> then X must be kept alive for the whole loop.

Yeah, since there are more examples like this than UCMP, I think this 
again falls under the category of making your own life unnecessarily 
difficult.

BTW, in some sense perhaps you could think of what you're trying to do 
here as taking a very aggressive stance on undefs like modern C/C++ 
compilers are doing. An argument can be made in favor of that, but as I 
hope I've shown you, it's a minefield. Should (x & 0) be undefined if x 
is undefined? Perhaps there are some languages that would like that and 
would benefit from it, but I'd rather play it safe here and assume that 
examples like what you've shown can be meaningful for any kind of 
instruction (whether UCMP or otherwise).

If, at some later point, you could show that there are big benefits to 
be had from making such undef-based optimizations, then we could 
reconsider. But I doubt it, and in the meantime trying to do this just 
muddles the concepts that this lifetime estimation is dealing with.

Also, just a reminder of the issue of writemasks and tracking components 
separately.

Cheers,
Nicolai

>>> +
>>> +   /* Here we are at the same scope, all is resolved */
>>> +   return make_lifetime(first_dominant_write, last_read);
>>
>> I suspect that there are a lot of logical cleanups and
>> simplifications that you can achieve in this function but sticking to
>> a straight story of what every variable really means.
>>
>> But please, first address the issue of multiple components and all
>> the style issues, then we can see what to about this.
> 
> Okay. Although I think that there are not many simplifications possible
> that don't sacrifice a close-to-minimal estimated life time.
> 
> Best,
> Gert
> 

-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.