[Mesa-dev] [RFC 1/2] nir: Add a new lowering pass nir_lower_vec_and_coalesce

Thu Sep 10 23:03:19 PDT 2015

On 09/10/2015 08:45 PM, Jason Ekstrand wrote:
> On Wed, Sep 9, 2015 at 11:51 PM, Eduardo Lima Mitev <elima at igalia.com> wrote:
>> On 09/09/2015 07:10 PM, Jason Ekstrand wrote:
>>>
>>> On Sep 8, 2015 23:27, "Eduardo Lima Mitev" <elima at igalia.com
>>> <mailto:elima at igalia.com>> wrote:
>>>>
>>>> This pass will propagate the destination components of a vecN
>>> instructions,
>>>> as destination of the instructions that define its sources; if certain
>>>> conditions are met.
>>>>
>>>> If all the components of the destination register in the vecN instruction
>>>> can be propagated, the instruction is removed. Otherwise, a new, reduced
>>>> vecN instruction is emitted with the channels that remained.
>>>>
>>>> This effectively coalesces registers and reduces indirection.
>>>>
>>>> By now, this pass will only propagate to ALU instructions, but it could
>>>> be extended to include other instructions like load_input intrinsic.
>>>>
>>>> It also propagates to instructions within the same block as the vecN
>>>> instruction. But it could be made to work cross-block in the future,
>>>> though there are non-trivial issues with this like considering
>>>> registers that are written in different branches of a conditional.
>>>> More analysis is needed to correctly cover these cases.
>>>>
>>>> This pass works on a NIR shader in final form (after SSA), and is
>>>> expected to run before nir_lower_vec_to_movs().
>>>> ---
>>>> [...]
>>>> +
>>>> +            /* We only override dest registers that are only used
>>> once, and in
>>>> +             * this vecX instruction.
>>>> +             * @TODO: In the future we might consider registers used
>>> more than
>>>> +             * once as sources of the same vecX instruction.
>>>> +             */
>>>
>>> Yeah, we definitely want to handle that case and we want to handle it by
>>> emitting a single instruction that does multiple channels.
>>>
>>
>> I started implementing that as part of of this patch but quickly
>> realized it was not trivial and basically postponed it until I got some
>> feedback for the basics.
>>
>>>> +            if (list_length(&parent_dest_reg->uses) != 1)
>>>> +               continue;
>>>> +
>>>> +            nir_alu_instr *new_alu_instr =
>>>> +               clone_alu_instr_and_override_dest(parent_alu_instr,
>>> &vec->dest,
>>>> +                                                 i, mem_ctx);
>>>
>>> This mess would be a lot easier if we were still in SSA form.  I've been
>>> thinking for a while that we should make lower_vec_to_movs work in SSA
>>> form (it would replace SSA destinations with register destinations).
>>> Then use partial SSA lowering like we do in the FS backend.  That would
>>> make this pass much easier together correct.
>>>
>>
>> I didn't know about the partial SSA pass in FS backend, I could have got
>> some inspiration there. But did tried hard to implement this while in
>> SSA form at the beginning, without much success because the combination
>> of nir_from_ssa and lower_vec_to_movs would screw things after any SSA pass.
> 
> Yeah, partial SSA was something that's actually fairly recent and
> happened in parallel with the vec4 stuff.
> 
>>> In general, I'm kind of thinking that this might be better done as an
>>> extension to the lower_vec_to_moves pass or as a new version of it.  You
>>> would loop through the sources of a vecN and, if you can rewrite a
>>> destination, you do and if you can't, you insert moves.  We may also
>>> want to  it handle SSA while we're at it.  Note: it will get run after
>>> nir_from_ssa so it will have to handle partial SSA where some
>>> destinations are SSA and some are registers.
>>>
>>
>> Actually, the first version of this patch, I wrote it against
>> lower_vec_to_movs, but then Matt suggested that likely it would be
>> easier for the opt_register_coalsece pass to avoid lower_vec_to_movs and
>> deal with vecN instructions in the backend. Then I rewrote this pass as
>> a separate thing that would re-emit reduced vecN instructions, to
>> essentially make it compatible with Matt's suggestion, thus future proof :).
> 
> Hrm... I'm not sure why matt suggested it be it's own pass but I can
> kind of see it.  At least in my implementation, it seemed to work
> fairly naturally to just put it in lower_vec_to_movs.  Is there some
> place where I can see your original pass that did it there?  If you
> don't have it handy, don't do a lot of work to dig it up, I'm mostly
> curious.
> 

Well, what Matt suggested was that the nir_lower_vec_to_movs pass itself
(without my changes) could be avoided, because dealing with vecN
instructions on backend side would simplify things for
opt_register_coalesce.

In any case, this is a version of the patch I had for lower_vec_to_movs,
but it was incomplete then and had regressions:

https://github.com/Igalia/mesa/commit/e82a732cac896b4ec7f6e58d95ff89cf684207df

>>> Does this seem reasonable?
>>>
>>
>> Absolutely. The first thing I noticed is that lowering stuff in NIR out
>> of SSA form is not very elegant and also not very well supported (lack
>> of utility APIs).
>>
>> I see you sent a series to the ML implementing some of the ideas in this
>> patch. The shader-db results are impressive!
>> I will use that as training material :) and follow the review process
>> closely. This also means I stop working on this pass.
> 
> I'm sorry if I stole your thunder; I didn't mean to.  It was just one
> of those things where it takes fewer lines of code to implement it
> than it takes lines of prose to explain it to someone.
>

Not at all :), I'm happy you took over. Your fix is way more complete. I
would have never produced a similar patch any time soon. The important
thing is that we have regressions under control now.

> Most of the
> work (once I saw what you were doing) was just in fixing core NIR bugs
> to make it all work.
> --Jason
> 
>> Thanks a lot for the review!
>>
>> Eduardo
>>