[Mesa-dev] [RFC] i965: alternative to memctx for cleaning up nir variants

Tue Dec 29 07:32:33 PST 2015

On Mon, Dec 28, 2015 at 4:23 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Mon, Dec 28, 2015 at 3:25 PM, Rob Clark <robdclark at gmail.com> wrote:
>> On Mon, Dec 28, 2015 at 2:05 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>
>>>
>>> On Mon, Dec 28, 2015 at 10:33 AM, Rob Clark <robdclark at gmail.com> wrote:
>>>>
>>>> On Mon, Dec 28, 2015 at 1:20 PM, Jason Ekstrand <jason at jlekstrand.net>
>>>> wrote:
>>>> >
>>>> >
>>>> > On Mon, Dec 28, 2015 at 9:37 AM, Connor Abbott <cwabbott0 at gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> On Mon, Dec 28, 2015 at 10:13 AM, Rob Clark <robdclark at gmail.com>
>>>> >> wrote:
>>>> >> > On Tue, Dec 22, 2015 at 10:11 PM, Connor Abbott <cwabbott0 at gmail.com>
>>>> >> > wrote:
>>>> >> >> On Tue, Dec 22, 2015 at 9:55 PM, Rob Clark <robdclark at gmail.com>
>>>> >> >> wrote:
>>>> >> >>> On Tue, Dec 22, 2015 at 9:47 PM, Connor Abbott
>>>> >> >>> <cwabbott0 at gmail.com>
>>>> >> >>> wrote:
>>>> >> >>>> On Tue, Dec 22, 2015 at 9:02 PM, Rob Clark <robdclark at gmail.com>
>>>> >> >>>> wrote:
>>>> >> >>>>> On Mon, Dec 21, 2015 at 1:48 PM, Jason Ekstrand
>>>> >> >>>>> <jason at jlekstrand.net> wrote:
>>>> >> >>>>>>
>>>> >> >>>>>> I think two different concepts of ownership are getting
>>>> >> >>>>>> conflated
>>>> >> >>>>>> here:
>>>> >> >>>>>> Right/responsibility to delete and right to modify.
>>>> >> >>>>>>
>>>> >> >>>>>> The way I understand it, gallium, as it stands, gives neither to
>>>> >> >>>>>> the driver.
>>>> >> >>>>>> A back-end using NIR requires the right to modify but who
>>>> >> >>>>>> deletes
>>>> >> >>>>>> it doesn't
>>>> >> >>>>>> ultimately matter.  I think it's dangerous to pass one of these
>>>> >> >>>>>> rights to
>>>> >> >>>>>> the driver and not the other but we need to think about both.
>>>> >> >>>>>
>>>> >> >>>>> yeah, uneasy about driver modifying the IR if the state tracker
>>>> >> >>>>> is
>>>> >> >>>>> still going to potentially spin off variants of the IR.. that
>>>> >> >>>>> sounds
>>>> >> >>>>> like madness.
>>>> >> >>>>>
>>>> >> >>>>> The refcnt'ing I proposed does deal w/ right to modify vs delete
>>>> >> >>>>> via
>>>> >> >>>>> nir_shader(_is)_mutable() which returns something that is
>>>> >> >>>>> guaranteed
>>>> >> >>>>> to be safe to modify (ie. has only a single reference)
>>>> >> >>>>>
>>>> >> >>>>>> What I'm trying to say is that we have two options here:
>>>> >> >>>>>>
>>>> >> >>>>>> 1) gallium passes IR to the back-end that it is free to modify
>>>> >> >>>>>> and
>>>> >> >>>>>> is
>>>> >> >>>>>> required to delete when it's done.
>>>> >> >>>>>
>>>> >> >>>>> with refcnt'ing, s/delete/unref/
>>>> >> >>>>>
>>>> >> >>>>> The idea is, the st transfers ownership of the reference it
>>>> >> >>>>> passes
>>>> >> >>>>> to
>>>> >> >>>>> the driver.  If the st wants to hang on to a reference itself, it
>>>> >> >>>>> must
>>>> >> >>>>> increment the refcnt before passing to the driver (backend).
>>>> >> >>>>>
>>>> >> >>>>> Without refcnt'ing, I suppose we could (since we don't have to
>>>> >> >>>>> follow
>>>> >> >>>>> TGSI semantics), just decree that the driver always takes
>>>> >> >>>>> ownership
>>>> >> >>>>> of
>>>> >> >>>>> the copy passed in, and if the st wants to hang on to a copy too,
>>>> >> >>>>> then
>>>> >> >>>>> it must clone.  I suppose this would work well enough for
>>>> >> >>>>> freedreno/vc4, which both end up generating variants later.  It
>>>> >> >>>>> does
>>>> >> >>>>> force an extra clone for drivers that immediately translate into
>>>> >> >>>>> their
>>>> >> >>>>> own backend IR and don't need to keep the NIR around, for
>>>> >> >>>>> example.
>>>> >> >>>>> Maybe that is not worth caring about (since at this point it is
>>>> >> >>>>> hypothetical).
>>>> >> >>>>
>>>> >> >>>> While always cloning does have this disadvantage, I don't think
>>>> >> >>>> it's
>>>> >> >>>> really relevant here. Even if the driver throws away the NIR
>>>> >> >>>> immediately after consuming it, almost invariably it's going to
>>>> >> >>>> want
>>>> >> >>>> to modify  it. The generic NIR passed in by the state tracker
>>>> >> >>>> (other
>>>> >> >>>> IR -> NIR + some optimizations) is almost never going to be the
>>>> >> >>>> same
>>>> >> >>>> as the NIR after going through driver-specific lowering passes,
>>>> >> >>>> which
>>>> >> >>>> means that drivers are never going to want a read-only version of
>>>> >> >>>> the
>>>> >> >>>> IR. In light of that, I think making the driver own the IR passed
>>>> >> >>>> in
>>>> >> >>>> seems like the most sensible thing.
>>>> >> >>>
>>>> >> >>> well, unless the driver is already doing it's own lowering in it's
>>>> >> >>> own
>>>> >> >>> native IR..
>>>> >> >>
>>>> >> >> Well, if you're not doing any lowering in NIR, then you aren't
>>>> >> >> really
>>>> >> >> taking any advantage of it. I can't see a plausible scenario where
>>>> >> >> all
>>>> >> >> the lowering is done in the driver's own IR -- and as soon as you do
>>>> >> >> anything in NIR, you need the driver-owns-IR semantics.
>>>> >> >
>>>> >> > When it comes to shader variants, I have a mix, with some things
>>>> >> > lowered in nir and others just handled in backend..
>>>> >> >
>>>> >> > The re-work / cleanup that I have had on a branch for a while now
>>>> >> > (since it is currently blocked on refcnt'ing) does a first round of
>>>> >> > variant-key independent NIR lowering/opt passes.  And then at draw
>>>> >> > time, if the variant key has anything that is lowered in nir, I do a
>>>> >> > second round.
>>>> >
>>>> >
>>>> > Just to be clear, your key-dependent lowering happens after all of your
>>>> > other lowering?  If this is the case, then I guarantee you that you're
>>>> > unique in this since i965 and vc4 need to at least run out-of-SSA
>>>> > afterwards.  To be honest, I completely forgot that a driver could use
>>>> > fully
>>>> > ssa NIR.
>>>>
>>>> It is a mix.. I do texcoord saturate, clip-plane, and 2-sided color
>>>> lowering in NIR.  But flat-shading, binning-pass, and half vs full
>>>> precision color output in ir3.
>>>>
>>>> I do as much lowering in NIR as I can, in an effort to do as much as
>>>> possible at compile time, vs draw time.  I do the first round of
>>>> lowering/opt w/ null shader key, which is enough for the common cases.
>>>>
>>>> Pretty much independent, I suppose, of whether I came out of SSA or
>>>> not first.  Although binning-pass variant and the instruction
>>>> scheduling I do are easier in SSA.
>>>>
>>>> Somewhat unrelated, but I may end up converting array access to
>>>> registers, but leave everything else in SSA, so I can benefit from
>>>> converting multi-dimensional offsets into a single offset..  this is
>>>> still one open issue w/ gallium glsl_to_nir.. right now I have a
>>>> hacked up version of nir_lower_io that converts global/local
>>>> load/store_var's into load/store_var2 which take an offset as a src
>>>> (like load_input/store_output) instead of deref chain.. not sure yet
>>>> whether this will be the permanent solution, but at least it fixes a
>>>> huge heap of variable-indexing piglits and lets me continue w/
>>>> implementing lowering passes for everything else that was previously
>>>> done in glsl->tgsi or tgsi->tgsi passes.
>>>
>>>
>>> If you do this, you'll be back to always needing a mutable copy.  Most
>>> lowering and optimization passes die the moment they see a register.  You'll
>>> either have to go fix a bunch of stuff up to no-op properly or run
>>> vars_to_regs after doing your NIR lowering but before going into your
>>> backend IR.  This means that your "gold copy" still has variables and you
>>> always need to lower them to registers before you go into the backend.
>>
>> ugg.. but good point, thanks for pointing that out before I wasted
>> another afternoon on yet another dead-end for handling deref's..
>>
>> Ok, I guess I need to think of a better name than load/store_var2 for
>> the new intrinsics ;-)
>
> I don't think that "you should throw away registers and use your own
> thing" is what Jason wanted you to get out of that.

perhaps..  I was considering switching to registers for arrays.
Although it would end up forcing an extra clone in the common case
where there would otherwise not be one... a bit of a tough pill to
swallow..

> Most of the
> existing optimization passes barf on registers for a reason: registers
> imply that you've gone from "consumer-agnostic NIR," i.e. what's
> produced by gtn and operated on by generic optimizations, to your own
> driver-specific thing, and any optimizations you're going to run are
> only to clean up the result of the lowering passes, so you won't need
> to run most of them. In the few cases where we do need an optimization
> after lowering to registers, we've gone and fixed it up to no-op
> things properly, but in general it's a lot easier and less confusing
> to say "new optimization passes don't have to deal with registers"
> than to make everyone go and add support for registers to their
> passes. I'm not saying that adding a "here's my driver-specific
> offset" thing to load/store_var would necessarily be a bad idea, but
> don't just dismiss registers out-of-hand.

Yeah, I'm not a big fan of making lowering/etc passes deal w/
registers unnecessarily.  Seems like coming up w/ some way to lower
load/store_var deref chains would be easier.

BR,
-R

>>
>> PS. I would really like to keep the nir_variable ptr in the intrinsic
>> instruction, rather than matching things up w/
>> var->data.driver_location, which is rather clunky..
>>
>> BR,
>> -R
>>
>>>>
>>>>
>>>> BR,
>>>> -R
>>>>
>>>> >>
>>>> >> >>>
>>>> >> >>> Maybe it is too much of a hypothetical.. I still think refcnt'ing
>>>> >> >>> gives some nice flexibility to deal with various scenarios, and
>>>> >> >>> having
>>>> >> >>> to call nir_shader_unref() isn't so much of a burden.
>>>> >> >>
>>>> >> >> Still, I can't see how this flexibility is at all useful, and it
>>>> >> >> seems
>>>> >> >> like overkill since the driver will always want a mutable version of
>>>> >> >> the IR anyways.
>>>> >> >
>>>> >> > Well, due to the structure I mentioned above, at draw time when I
>>>> >> > need
>>>> >> > to generate a variant with nothing lowered in NIR, I simply incr the
>>>> >> > refcnt on the IR which has already gone through first round of NIR
>>>> >> > passes, and pass that in to my back end compiler.  At the end, once
>>>> >> > the shader binary is generated, I can unconditionally unref the
>>>> >> > nir_shader without having to care.
>>>> >> >
>>>> >> > Without refcnt'ing I'd either have to generate a pointless clone or
>>>> >> > keep track that the nir_shader should not actually be free'd.  Not
>>>> >> > impossible, just a bit more ugly.
>>>> >>
>>>> >> Assuming you do all your variant management in your driver's IR, then
>>>> >> you don't need to do anything. If you do some variant management in
>>>> >> NIR, then in the function where you do the NIR-based lowering you can
>>>> >> check if you need to do any lowering based on the shader key, clone
>>>> >> first, and give the NIR->ir3 function the cloned IR and then free it.
>>>> >> It might be a "bit more ugly," but it's really not that much different
>>>> >> from the refcounting, and when the extra shader gets created/freed is
>>>> >> made explicit.
>>>> >>
>>>> >> >
>>>> >> > (The gallium glsl_to_nir stuff is also currently using refcnt'ing,
>>>> >> > although at least for freedreno/ir3 it isn't strictly needed.. I
>>>> >> > could
>>>> >> > just unconditionally clone in the state tracker.  That said, I'm
>>>> >> > still
>>>> >> > of the opinion that refcnt'ing could be useful to some other driver
>>>> >> > someday)
>>>> >>
>>>> >> "It could be useful to some driver someday" isn't a good argument for
>>>> >> adding stuff today. We've already had enough examples of things in NIR
>>>> >> that we added because we thought it was useful, but turned out not to
>>>> >> be.
>>>> >>
>>>> >> >
>>>> >> > BR,
>>>> >> > -R
>>>> >> >
>>>> >> >>>
>>>> >> >>> BR,
>>>> >> >>> -R
>>>> >> >>>
>>>> >> >>>>>
>>>> >> >>>>> (I guess nouveau is the one driver, that if it ever consumed NIR,
>>>> >> >>>>> would translate immediately into it's own backend IR?)
>>>> >> >>>>>
>>>> >> >>>>>> 2) gallium passes read-only IR to the back-end and it always
>>>> >> >>>>>> makes
>>>> >> >>>>>> a copy.
>>>> >> >>>>>
>>>> >> >>>>> This is how it is w/ TGSI, but I think with NIR we are free to
>>>> >> >>>>> make
>>>> >> >>>>> a
>>>> >> >>>>> clean break.  And we *definitely* want to avoid
>>>> >> >>>>> the-driver-always-copies semantics..
>>>> >> >>>>>
>>>> >> >>>>>> It sounds like, from what Marek is saying, that gallium is
>>>> >> >>>>>> currently doing
>>>> >> >>>>>> (2) and changing it to (1) would be painful.  I think reference
>>>> >> >>>>>> counting is
>>>> >> >>>>>> more like an awkward option 1.5 than option 3.  Reference
>>>> >> >>>>>> counting
>>>> >> >>>>>> would
>>>> >> >>>>>> mean that gallium passes a reference to the driver which it is
>>>> >> >>>>>> expected to
>>>> >> >>>>>> unref but may keep a second reference if it wants to keep the
>>>> >> >>>>>> driver from
>>>> >> >>>>>> modifying it.  Then the driver may or may not make a copy based
>>>> >> >>>>>> on
>>>> >> >>>>>> the
>>>> >> >>>>>> number of references.  Why don't we just make it explicit and
>>>> >> >>>>>> add a
>>>> >> >>>>>> read-only bit and call it a day.
>>>> >> >>>>>>
>>>> >> >>>>>> One of the reasons I don't like passing a reference is that it
>>>> >> >>>>>> effectively
>>>> >> >>>>>> puts allocation and freeing in different components of the
>>>> >> >>>>>> driver.
>>>> >> >>>>>
>>>> >> >>>>> With refcnt'ing you should talk in terms of ref/unref rather than
>>>> >> >>>>> allocate/free.. imho.  Although maybe that is what you meant.
>>>> >> >>>>> (In
>>>> >> >>>>> which case, yes, that was my idea, that passing in to driver
>>>> >> >>>>> transfers
>>>> >> >>>>> ownership of the passed reference.)
>>>> >> >>>>>
>>>> >> >>>>>> This
>>>> >> >>>>>> means that if and driver doesn't care at all about the shader
>>>> >> >>>>>> that
>>>> >> >>>>>> gets
>>>> >> >>>>>> passed in, it still has to under it to avoid a memory leak.  You
>>>> >> >>>>>> can't have
>>>> >> >>>>>> the driver take the reference because then, either it comes in
>>>> >> >>>>>> with
>>>> >> >>>>>> a
>>>> >> >>>>>> recount of 0 and should have been deleted, or the "can I modify
>>>> >> >>>>>> this" check
>>>> >> >>>>>> becomes "recount <= 2" which makes no sense.
>>>> >> >>>>>
>>>> >> >>>>> hmm, no, if ownership of the reference is transferred to the
>>>> >> >>>>> driver,
>>>> >> >>>>> then it becomes "refcount == 1" (and refcount == 0 should be an
>>>> >> >>>>> assert, because something has gone horribly wrong)
>>>> >> >>>>>
>>>> >> >>>>> BR,
>>>> >> >>>>> -R
>>>> >
>>>> >
>>>
>>>