[Mesa-dev] [RFC] i965: alternative to memctx for cleaning up nir variants

Thu Dec 31 09:37:50 PST 2015

On Thu, Dec 31, 2015 at 12:05 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Thu, Dec 31, 2015 at 10:16 AM, Rob Clark <robdclark at gmail.com> wrote:
>> On Tue, Dec 29, 2015 at 10:32 AM, Rob Clark <robdclark at gmail.com> wrote:
>>>>>> If you do this, you'll be back to always needing a mutable copy.  Most
>>>>>> lowering and optimization passes die the moment they see a register.  You'll
>>>>>> either have to go fix a bunch of stuff up to no-op properly or run
>>>>>> vars_to_regs after doing your NIR lowering but before going into your
>>>>>> backend IR.  This means that your "gold copy" still has variables and you
>>>>>> always need to lower them to registers before you go into the backend.
>>>>>
>>>>> ugg.. but good point, thanks for pointing that out before I wasted
>>>>> another afternoon on yet another dead-end for handling deref's..
>>>>>
>>>>> Ok, I guess I need to think of a better name than load/store_var2 for
>>>>> the new intrinsics ;-)
>>>>
>>>> I don't think that "you should throw away registers and use your own
>>>> thing" is what Jason wanted you to get out of that.
>>>
>>> perhaps..  I was considering switching to registers for arrays.
>>> Although it would end up forcing an extra clone in the common case
>>> where there would otherwise not be one... a bit of a tough pill to
>>> swallow..
>>
>>
>> I've been thinking through this a bit, and the whole load/store
>> intrinsic for var access (vs. potentially being the src and/or dst of
>> any other instruction, with registers) is pretty damn convenient for
>> me..
>>
>> Not all instructions support indirect dst and/or src, and some support
>> indirect in certain src positions, but not others.  I have similar
>> constraints with const (uniform), fwiw.
>>
>> In addition to avoiding a lot of churn in nir->ir3 I think it would be
>> easier to deal with these kind of constraints by always starting out
>> with a move, and then let an ir3 backend pass collapse that into the
>> instruction(s) that consumes the mov when possible, similar to what we
>> do already with uniforms.
>>
>> So thinking of introducing load/store_global and load/store_local
>> intrinsics, and lowering to them in lower_io.
>>
>> BR,
>> -R
>
> The thng about that is, starting out with a separate instruction is
> much harder than splitting it out. Splitting out an indirect access
> can be done easily, and in your case, on the fly as you convert from
> NIR, whereas inlining accesses is a lot more painful, since you're
> essentially back to not having the full SSA information. We want to
> (eventually) solve that problem in NIR, and not in everyone's backend.
> For that reason, you're likely going to be the only user of these new
> intrinsics/derefs/whatever. Both Jason and I are confused as to why
> you don't want to make the minor changes needed to adopt the thing
> that is designed to do exactly what you want to do, rather than
> rewrite core NIR for something that would save maybe <50 lines of code
> in your backend (and really, that's what we're talking about).

Folding back an unneeded mov is actually pretty trivial, esp since I
still have it in SSA form at that point.  (I guess you are basing your
claim on an assumption that I wasn't in SSA at that point??)  Not to
mention that I already have to do this for const's anyways, so might
as well be consistent.  It would be a pretty simple extension of the
same logic to handle it for indirect's.

Basically, the way uniforms (and inputs and outputs) work for
direct/indirect works pretty well for me.  And handling arrays in the
same way is attractive.

Fwiw, with the tgsi f/e I handled all these special cases in the
frontend, inserting extra mov's.  I'm happier with the approach I took
w/ the nir f/e where I'm not handling that in the f/e.

BR,
-R