[Mesa-dev] [PATCH] nir: add helper to get # of src/dest components

Thu Jun 18 11:34:17 PDT 2015

On Thu, Jun 18, 2015 at 11:19 AM, Rob Clark <robdclark at gmail.com> wrote:
> On Thu, Jun 18, 2015 at 1:27 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>> On Thu, Jun 18, 2015 at 9:42 AM, Rob Clark <robdclark at gmail.com> wrote:
>>> On Thu, Jun 18, 2015 at 11:01 AM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>>>>>> (really I want phi's for variables too..  the way I turn arrays into
>>>>>>> fanin/collect fanout/split works on the backend for dealing with
>>>>>>> arrays in ssa form (other than making instruction graph large) but the
>>>>>>> way I go from nir to ir3 currently assumes I get nir phi's everywhere
>>>>>>> I need an ir3 phi)
>>>>>>
>>>>>> Right... we explicitly decided not to support SSA form for arrays in
>>>>>> NIR, since it seemed like a pretty bad idea. SSA form assumes that
>>>>>> inserting copies for the things you're SSA-ifying is relatively
>>>>>> inexpensive, and both SSA-based register allocation and algorithms for
>>>>>> converting out of SSA make no guarantees about not inserting
>>>>>> potentially unnecessary copies. This is a good compromise for smaller
>>>>>> things, but not for your array of (say) 64 things where inserting an
>>>>>> extra copy is rather disastrous. You can do it if you want and shoot
>>>>>> yourself in the foot, but NIR isn't going to help you there -- we
>>>>>> won't write a to-SSA pass for something which doesn't make much sense
>>>>>> in the first place.
>>>>>>
>>>>>
>>>>> in ir3 my solution is to add sufficient dependencies between
>>>>> instructions so the array accesses don't get re-ordered and they all
>>>>> collapse down to a single name per array element/slot
>>>>
>>>> It's not about getting reordered, it's about interference. The problem
>>>> is that as soon as you do basically any optimization at all (even copy
>>>> propagation), you can wind up with a situation where the sources and
>>>> destination of a phi node interfere with each other and you have to
>>>> insert extra mov's. And even if you keep everything exactly the same,
>>>> with an SSA-based register allocator, there's always the chance that
>>>> it'll screw up anyways and allocate something over your array and
>>>> force you to insert a mov. You could force the array to be allocated
>>>> to a single hardware register, but then it's not really an SSA value
>>>> anymore -- it's a hardware register, and you can't treat it like an
>>>> SSA value anymore in your allocator, and so adding phi nodes and
>>>> whatnot for it in your IR doesn't make much sense.
>>>
>>>
>>> But the point I'm trying to make is that I need the links from src to
>>> dest that I get in SSA form for *scheduling* (for example, to know how
>>> many delay slots are needed between two instructions).  For things
>>> like if/else, I would need to consider number of cycles since either
>>> possible assigner.  For everything else, the phi nodes (in ir3, not in
>>> nir) give me this.  Arrays are not special (since they are allocated
>>> in registers) when it comes to cycle counts.
>>>
>>> BR,
>>> -R
>>
>> Except that they still are special, and you need to account for that
>> when you set up scheduling dependencies for them. For example, imagine
>> that you have an array A accessed in a loop:
>>
>> while (...) {
>>     ... = A[i];
>>     A[i] = ...;
>> }
>>
>> if you lower the array to SSA, this will give you something like:
>>
>> while (...) {
>>     A_1 = phi(A_0, A_2);
>>     ... = A_1[i];
>>     A_2 = Update(A_1, i, ...); /* makes a copy with the i'th element changed */
>> }
>>
>> and when you set up scheduling dependencies, you'll miss the false
>> write-after-read dependency between the access and the update, meaning
>> you could end up with:
>>
>> while (...) {
>>     A_1 = phi(A_0, A_2);
>>     A_2 = Update(A_1, i, ...);
>>     ... = A_1[i];
>> }
>>
>> and now the number of instructions in your shader has exploded since
>> you have to insert a copy somewhere. You could add all the false
>> dependencies by yourself, and force it into the same register, but by
>> that point you've already given up all the advantages that SSA has to
>> offer and inserting phi nodes is a rather pointless exercise.
>
> except, like I said, for the purpose of scheduling realizing that
> there are two dependency paths to consider for figuring out the
> required number of delay slots..

No, there aren't. There's the dependency between the read and the
write (in my example), which serializes things and makes it one path.
In other words, the stuff before the write must happen before the
write, even if in SSA those are two different values.
>
> That said, having to re-invent the to-ssa pass in my backend is
> something I was hoping to avoid.. so I'm thinking of other
> alternatives.  But currently the depth calculations (used for
> scheduling), dead code elimination (used if nothing else to clean
> things up when generating binning-pass shader), scheduling, and even
> to some degree RA, depend on this use-def graph between instructions.

What I'm saying is that for arrays, the use-def graph isn't enough. At
least the scheduler has to consider additional dependencies when doing
the depth calculations and when actually scheduling, and the RA has to
split live-ranges of other things and deal with arrays specially too
in order to not introduce extra array copies. SSA does help a little
with dead code elimination, but even there having arrays in SSA
doesn't win you much compared to the non-SSA approach (when there are
no reads left for the array, delete the array). The point is that SSA
doesn't gain you nearly as much as you think, and it would actually be
easier to rewrite things so that arrays aren't in SSA form than to
support this awkward half-SSA thing.

>
> BR,
> -R