[Mesa-dev] [PATCH] nir: add helper to get # of src/dest components

Thu Jun 18 11:19:35 PDT 2015

On Thu, Jun 18, 2015 at 1:27 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Thu, Jun 18, 2015 at 9:42 AM, Rob Clark <robdclark at gmail.com> wrote:
>> On Thu, Jun 18, 2015 at 11:01 AM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>>>>> (really I want phi's for variables too..  the way I turn arrays into
>>>>>> fanin/collect fanout/split works on the backend for dealing with
>>>>>> arrays in ssa form (other than making instruction graph large) but the
>>>>>> way I go from nir to ir3 currently assumes I get nir phi's everywhere
>>>>>> I need an ir3 phi)
>>>>>
>>>>> Right... we explicitly decided not to support SSA form for arrays in
>>>>> NIR, since it seemed like a pretty bad idea. SSA form assumes that
>>>>> inserting copies for the things you're SSA-ifying is relatively
>>>>> inexpensive, and both SSA-based register allocation and algorithms for
>>>>> converting out of SSA make no guarantees about not inserting
>>>>> potentially unnecessary copies. This is a good compromise for smaller
>>>>> things, but not for your array of (say) 64 things where inserting an
>>>>> extra copy is rather disastrous. You can do it if you want and shoot
>>>>> yourself in the foot, but NIR isn't going to help you there -- we
>>>>> won't write a to-SSA pass for something which doesn't make much sense
>>>>> in the first place.
>>>>>
>>>>
>>>> in ir3 my solution is to add sufficient dependencies between
>>>> instructions so the array accesses don't get re-ordered and they all
>>>> collapse down to a single name per array element/slot
>>>
>>> It's not about getting reordered, it's about interference. The problem
>>> is that as soon as you do basically any optimization at all (even copy
>>> propagation), you can wind up with a situation where the sources and
>>> destination of a phi node interfere with each other and you have to
>>> insert extra mov's. And even if you keep everything exactly the same,
>>> with an SSA-based register allocator, there's always the chance that
>>> it'll screw up anyways and allocate something over your array and
>>> force you to insert a mov. You could force the array to be allocated
>>> to a single hardware register, but then it's not really an SSA value
>>> anymore -- it's a hardware register, and you can't treat it like an
>>> SSA value anymore in your allocator, and so adding phi nodes and
>>> whatnot for it in your IR doesn't make much sense.
>>
>>
>> But the point I'm trying to make is that I need the links from src to
>> dest that I get in SSA form for *scheduling* (for example, to know how
>> many delay slots are needed between two instructions).  For things
>> like if/else, I would need to consider number of cycles since either
>> possible assigner.  For everything else, the phi nodes (in ir3, not in
>> nir) give me this.  Arrays are not special (since they are allocated
>> in registers) when it comes to cycle counts.
>>
>> BR,
>> -R
>
> Except that they still are special, and you need to account for that
> when you set up scheduling dependencies for them. For example, imagine
> that you have an array A accessed in a loop:
>
> while (...) {
>     ... = A[i];
>     A[i] = ...;
> }
>
> if you lower the array to SSA, this will give you something like:
>
> while (...) {
>     A_1 = phi(A_0, A_2);
>     ... = A_1[i];
>     A_2 = Update(A_1, i, ...); /* makes a copy with the i'th element changed */
> }
>
> and when you set up scheduling dependencies, you'll miss the false
> write-after-read dependency between the access and the update, meaning
> you could end up with:
>
> while (...) {
>     A_1 = phi(A_0, A_2);
>     A_2 = Update(A_1, i, ...);
>     ... = A_1[i];
> }
>
> and now the number of instructions in your shader has exploded since
> you have to insert a copy somewhere. You could add all the false
> dependencies by yourself, and force it into the same register, but by
> that point you've already given up all the advantages that SSA has to
> offer and inserting phi nodes is a rather pointless exercise.

except, like I said, for the purpose of scheduling realizing that
there are two dependency paths to consider for figuring out the
required number of delay slots..

That said, having to re-invent the to-ssa pass in my backend is
something I was hoping to avoid.. so I'm thinking of other
alternatives.  But currently the depth calculations (used for
scheduling), dead code elimination (used if nothing else to clean
things up when generating binning-pass shader), scheduling, and even
to some degree RA, depend on this use-def graph between instructions.

BR,
-R