[Mesa-dev] [RFC] i965: alternative to memctx for cleaning up nir variants

Jason Ekstrand jason at jlekstrand.net
Tue Dec 29 09:36:34 PST 2015


On Tue, Dec 29, 2015 at 7:32 AM, Rob Clark <robdclark at gmail.com> wrote:

> On Mon, Dec 28, 2015 at 4:23 PM, Connor Abbott <cwabbott0 at gmail.com>
> wrote:
> > On Mon, Dec 28, 2015 at 3:25 PM, Rob Clark <robdclark at gmail.com> wrote:
> >> On Mon, Dec 28, 2015 at 2:05 PM, Jason Ekstrand <jason at jlekstrand.net>
> wrote:
> >>>
> >>>
> >>> On Mon, Dec 28, 2015 at 10:33 AM, Rob Clark <robdclark at gmail.com>
> wrote:
> >>>>
> >>>> On Mon, Dec 28, 2015 at 1:20 PM, Jason Ekstrand <jason at jlekstrand.net
> >
> >>>> wrote:
> >>>> >
> >>>> >
> >>>> > On Mon, Dec 28, 2015 at 9:37 AM, Connor Abbott <cwabbott0 at gmail.com
> >
> >>>> > wrote:
> >>>> >>
> >>>> >> On Mon, Dec 28, 2015 at 10:13 AM, Rob Clark <robdclark at gmail.com>
> >>>> >> wrote:
> >>>> >> > On Tue, Dec 22, 2015 at 10:11 PM, Connor Abbott <
> cwabbott0 at gmail.com>
> >>>> >> > wrote:
> >>>> >> >> On Tue, Dec 22, 2015 at 9:55 PM, Rob Clark <robdclark at gmail.com
> >
> >>>> >> >> wrote:
> >>>> >> >>> On Tue, Dec 22, 2015 at 9:47 PM, Connor Abbott
> >>>> >> >>> <cwabbott0 at gmail.com>
> >>>> >> >>> wrote:
> >>>> >> >>>> On Tue, Dec 22, 2015 at 9:02 PM, Rob Clark <
> robdclark at gmail.com>
> >>>> >> >>>> wrote:
> >>>> >> >>>>> On Mon, Dec 21, 2015 at 1:48 PM, Jason Ekstrand
> >>>> >> >>>>> <jason at jlekstrand.net> wrote:
> >>>> >> >>>>>>
> >>>> >> >>>>>> I think two different concepts of ownership are getting
> >>>> >> >>>>>> conflated
> >>>> >> >>>>>> here:
> >>>> >> >>>>>> Right/responsibility to delete and right to modify.
> >>>> >> >>>>>>
> >>>> >> >>>>>> The way I understand it, gallium, as it stands, gives
> neither to
> >>>> >> >>>>>> the driver.
> >>>> >> >>>>>> A back-end using NIR requires the right to modify but who
> >>>> >> >>>>>> deletes
> >>>> >> >>>>>> it doesn't
> >>>> >> >>>>>> ultimately matter.  I think it's dangerous to pass one of
> these
> >>>> >> >>>>>> rights to
> >>>> >> >>>>>> the driver and not the other but we need to think about
> both.
> >>>> >> >>>>>
> >>>> >> >>>>> yeah, uneasy about driver modifying the IR if the state
> tracker
> >>>> >> >>>>> is
> >>>> >> >>>>> still going to potentially spin off variants of the IR.. that
> >>>> >> >>>>> sounds
> >>>> >> >>>>> like madness.
> >>>> >> >>>>>
> >>>> >> >>>>> The refcnt'ing I proposed does deal w/ right to modify vs
> delete
> >>>> >> >>>>> via
> >>>> >> >>>>> nir_shader(_is)_mutable() which returns something that is
> >>>> >> >>>>> guaranteed
> >>>> >> >>>>> to be safe to modify (ie. has only a single reference)
> >>>> >> >>>>>
> >>>> >> >>>>>> What I'm trying to say is that we have two options here:
> >>>> >> >>>>>>
> >>>> >> >>>>>> 1) gallium passes IR to the back-end that it is free to
> modify
> >>>> >> >>>>>> and
> >>>> >> >>>>>> is
> >>>> >> >>>>>> required to delete when it's done.
> >>>> >> >>>>>
> >>>> >> >>>>> with refcnt'ing, s/delete/unref/
> >>>> >> >>>>>
> >>>> >> >>>>> The idea is, the st transfers ownership of the reference it
> >>>> >> >>>>> passes
> >>>> >> >>>>> to
> >>>> >> >>>>> the driver.  If the st wants to hang on to a reference
> itself, it
> >>>> >> >>>>> must
> >>>> >> >>>>> increment the refcnt before passing to the driver (backend).
> >>>> >> >>>>>
> >>>> >> >>>>> Without refcnt'ing, I suppose we could (since we don't have
> to
> >>>> >> >>>>> follow
> >>>> >> >>>>> TGSI semantics), just decree that the driver always takes
> >>>> >> >>>>> ownership
> >>>> >> >>>>> of
> >>>> >> >>>>> the copy passed in, and if the st wants to hang on to a copy
> too,
> >>>> >> >>>>> then
> >>>> >> >>>>> it must clone.  I suppose this would work well enough for
> >>>> >> >>>>> freedreno/vc4, which both end up generating variants later.
> It
> >>>> >> >>>>> does
> >>>> >> >>>>> force an extra clone for drivers that immediately translate
> into
> >>>> >> >>>>> their
> >>>> >> >>>>> own backend IR and don't need to keep the NIR around, for
> >>>> >> >>>>> example.
> >>>> >> >>>>> Maybe that is not worth caring about (since at this point it
> is
> >>>> >> >>>>> hypothetical).
> >>>> >> >>>>
> >>>> >> >>>> While always cloning does have this disadvantage, I don't
> think
> >>>> >> >>>> it's
> >>>> >> >>>> really relevant here. Even if the driver throws away the NIR
> >>>> >> >>>> immediately after consuming it, almost invariably it's going
> to
> >>>> >> >>>> want
> >>>> >> >>>> to modify  it. The generic NIR passed in by the state tracker
> >>>> >> >>>> (other
> >>>> >> >>>> IR -> NIR + some optimizations) is almost never going to be
> the
> >>>> >> >>>> same
> >>>> >> >>>> as the NIR after going through driver-specific lowering
> passes,
> >>>> >> >>>> which
> >>>> >> >>>> means that drivers are never going to want a read-only
> version of
> >>>> >> >>>> the
> >>>> >> >>>> IR. In light of that, I think making the driver own the IR
> passed
> >>>> >> >>>> in
> >>>> >> >>>> seems like the most sensible thing.
> >>>> >> >>>
> >>>> >> >>> well, unless the driver is already doing it's own lowering in
> it's
> >>>> >> >>> own
> >>>> >> >>> native IR..
> >>>> >> >>
> >>>> >> >> Well, if you're not doing any lowering in NIR, then you aren't
> >>>> >> >> really
> >>>> >> >> taking any advantage of it. I can't see a plausible scenario
> where
> >>>> >> >> all
> >>>> >> >> the lowering is done in the driver's own IR -- and as soon as
> you do
> >>>> >> >> anything in NIR, you need the driver-owns-IR semantics.
> >>>> >> >
> >>>> >> > When it comes to shader variants, I have a mix, with some things
> >>>> >> > lowered in nir and others just handled in backend..
> >>>> >> >
> >>>> >> > The re-work / cleanup that I have had on a branch for a while now
> >>>> >> > (since it is currently blocked on refcnt'ing) does a first round
> of
> >>>> >> > variant-key independent NIR lowering/opt passes.  And then at
> draw
> >>>> >> > time, if the variant key has anything that is lowered in nir, I
> do a
> >>>> >> > second round.
> >>>> >
> >>>> >
> >>>> > Just to be clear, your key-dependent lowering happens after all of
> your
> >>>> > other lowering?  If this is the case, then I guarantee you that
> you're
> >>>> > unique in this since i965 and vc4 need to at least run out-of-SSA
> >>>> > afterwards.  To be honest, I completely forgot that a driver could
> use
> >>>> > fully
> >>>> > ssa NIR.
> >>>>
> >>>> It is a mix.. I do texcoord saturate, clip-plane, and 2-sided color
> >>>> lowering in NIR.  But flat-shading, binning-pass, and half vs full
> >>>> precision color output in ir3.
> >>>>
> >>>> I do as much lowering in NIR as I can, in an effort to do as much as
> >>>> possible at compile time, vs draw time.  I do the first round of
> >>>> lowering/opt w/ null shader key, which is enough for the common cases.
> >>>>
> >>>> Pretty much independent, I suppose, of whether I came out of SSA or
> >>>> not first.  Although binning-pass variant and the instruction
> >>>> scheduling I do are easier in SSA.
> >>>>
> >>>> Somewhat unrelated, but I may end up converting array access to
> >>>> registers, but leave everything else in SSA, so I can benefit from
> >>>> converting multi-dimensional offsets into a single offset..  this is
> >>>> still one open issue w/ gallium glsl_to_nir.. right now I have a
> >>>> hacked up version of nir_lower_io that converts global/local
> >>>> load/store_var's into load/store_var2 which take an offset as a src
> >>>> (like load_input/store_output) instead of deref chain.. not sure yet
> >>>> whether this will be the permanent solution, but at least it fixes a
> >>>> huge heap of variable-indexing piglits and lets me continue w/
> >>>> implementing lowering passes for everything else that was previously
> >>>> done in glsl->tgsi or tgsi->tgsi passes.
> >>>
> >>>
> >>> If you do this, you'll be back to always needing a mutable copy.  Most
> >>> lowering and optimization passes die the moment they see a register.
> You'll
> >>> either have to go fix a bunch of stuff up to no-op properly or run
> >>> vars_to_regs after doing your NIR lowering but before going into your
> >>> backend IR.  This means that your "gold copy" still has variables and
> you
> >>> always need to lower them to registers before you go into the backend.
> >>
> >> ugg.. but good point, thanks for pointing that out before I wasted
> >> another afternoon on yet another dead-end for handling deref's..
> >>
> >> Ok, I guess I need to think of a better name than load/store_var2 for
> >> the new intrinsics ;-)
> >
> > I don't think that "you should throw away registers and use your own
> > thing" is what Jason wanted you to get out of that.
>

Correct.  Registers are designed explicitly to do exactly what you want:
Provide an easy-to-work-with linear view of complex variables.  I still
don't understand why you're trying so hard not to use them.  The code is
already written for you, you just have to turn it on.  The only thing you
might have to do is make it take a type_size function like nir_lower_io
does so that you can configure offset units to your backend's liking.

What i was trying to get across is that the situation in which you can
avoid cloning by clever use of reference counting is specific to your
driver and exact way that you have your lowering passes set up.  If you
perturb it even a little, you have to do a copy all the time and reference
counting isn't helping anymore.  You're free to design your entire compiler
stack around avoiding that one copy if you wish, but I wouldn't recommend
it.

perhaps..  I was considering switching to registers for arrays.
> Although it would end up forcing an extra clone in the common case
> where there would otherwise not be one... a bit of a tough pill to
> swallow..
>

I don't see why that is such a tough pill.  Copying is cheap.  When we were
writing the cloning code, we both ran shader-db runs where we were cloning
after *every* optimization or lowering pass and it still only hurt runtime
by something like 10 or 20%.  A single clone won't even get noticed.

Also, you've already said that you pre-compile for the "common case" of a
zero shader key so that extra clone gets eaten at compile time where you're
already doing piles of optimization and lowering.  The case you really care
about is when that key is non-zero and you have to stop everything and
recompile in the middle of a draw.  In that case, you have a non-zero key
so you have to do a clone anyway.

At the end of the day, I think we're getting nowhere here.  We have two
different memory management models that are in conflict.  The ralloc model
saves us typing and provides some nice safety and refcounting saves you
some typing and privdes you some nice safety.  It's becoming fairly obvious
that neither side is going to convince the other that their model is better
any time soon.  I'm open to suggestions on how to proceed.  One option
would be to have Anholt come in and break the tie.  I'd be ok with that.
In any case, we need to solve this one way or another and either commit the
patch or not.
--Jason

> Most of the
> > existing optimization passes barf on registers for a reason: registers
> > imply that you've gone from "consumer-agnostic NIR," i.e. what's
> > produced by gtn and operated on by generic optimizations, to your own
> > driver-specific thing, and any optimizations you're going to run are
> > only to clean up the result of the lowering passes, so you won't need
> > to run most of them. In the few cases where we do need an optimization
> > after lowering to registers, we've gone and fixed it up to no-op
> > things properly, but in general it's a lot easier and less confusing
> > to say "new optimization passes don't have to deal with registers"
> > than to make everyone go and add support for registers to their
> > passes. I'm not saying that adding a "here's my driver-specific
> > offset" thing to load/store_var would necessarily be a bad idea, but
> > don't just dismiss registers out-of-hand.
>
> Yeah, I'm not a big fan of making lowering/etc passes deal w/
> registers unnecessarily.  Seems like coming up w/ some way to lower
> load/store_var deref chains would be easier.
>
> BR,
> -R
>
> >>
> >> PS. I would really like to keep the nir_variable ptr in the intrinsic
> >> instruction, rather than matching things up w/
> >> var->data.driver_location, which is rather clunky..
> >>
> >> BR,
> >> -R
> >>
> >>>>
> >>>>
> >>>> BR,
> >>>> -R
> >>>>
> >>>> >>
> >>>> >> >>>
> >>>> >> >>> Maybe it is too much of a hypothetical.. I still think
> refcnt'ing
> >>>> >> >>> gives some nice flexibility to deal with various scenarios, and
> >>>> >> >>> having
> >>>> >> >>> to call nir_shader_unref() isn't so much of a burden.
> >>>> >> >>
> >>>> >> >> Still, I can't see how this flexibility is at all useful, and it
> >>>> >> >> seems
> >>>> >> >> like overkill since the driver will always want a mutable
> version of
> >>>> >> >> the IR anyways.
> >>>> >> >
> >>>> >> > Well, due to the structure I mentioned above, at draw time when I
> >>>> >> > need
> >>>> >> > to generate a variant with nothing lowered in NIR, I simply incr
> the
> >>>> >> > refcnt on the IR which has already gone through first round of
> NIR
> >>>> >> > passes, and pass that in to my back end compiler.  At the end,
> once
> >>>> >> > the shader binary is generated, I can unconditionally unref the
> >>>> >> > nir_shader without having to care.
> >>>> >> >
> >>>> >> > Without refcnt'ing I'd either have to generate a pointless clone
> or
> >>>> >> > keep track that the nir_shader should not actually be free'd.
> Not
> >>>> >> > impossible, just a bit more ugly.
> >>>> >>
> >>>> >> Assuming you do all your variant management in your driver's IR,
> then
> >>>> >> you don't need to do anything. If you do some variant management in
> >>>> >> NIR, then in the function where you do the NIR-based lowering you
> can
> >>>> >> check if you need to do any lowering based on the shader key, clone
> >>>> >> first, and give the NIR->ir3 function the cloned IR and then free
> it.
> >>>> >> It might be a "bit more ugly," but it's really not that much
> different
> >>>> >> from the refcounting, and when the extra shader gets created/freed
> is
> >>>> >> made explicit.
> >>>> >>
> >>>> >> >
> >>>> >> > (The gallium glsl_to_nir stuff is also currently using
> refcnt'ing,
> >>>> >> > although at least for freedreno/ir3 it isn't strictly needed.. I
> >>>> >> > could
> >>>> >> > just unconditionally clone in the state tracker.  That said, I'm
> >>>> >> > still
> >>>> >> > of the opinion that refcnt'ing could be useful to some other
> driver
> >>>> >> > someday)
> >>>> >>
> >>>> >> "It could be useful to some driver someday" isn't a good argument
> for
> >>>> >> adding stuff today. We've already had enough examples of things in
> NIR
> >>>> >> that we added because we thought it was useful, but turned out not
> to
> >>>> >> be.
> >>>> >>
> >>>> >> >
> >>>> >> > BR,
> >>>> >> > -R
> >>>> >> >
> >>>> >> >>>
> >>>> >> >>> BR,
> >>>> >> >>> -R
> >>>> >> >>>
> >>>> >> >>>>>
> >>>> >> >>>>> (I guess nouveau is the one driver, that if it ever consumed
> NIR,
> >>>> >> >>>>> would translate immediately into it's own backend IR?)
> >>>> >> >>>>>
> >>>> >> >>>>>> 2) gallium passes read-only IR to the back-end and it always
> >>>> >> >>>>>> makes
> >>>> >> >>>>>> a copy.
> >>>> >> >>>>>
> >>>> >> >>>>> This is how it is w/ TGSI, but I think with NIR we are free
> to
> >>>> >> >>>>> make
> >>>> >> >>>>> a
> >>>> >> >>>>> clean break.  And we *definitely* want to avoid
> >>>> >> >>>>> the-driver-always-copies semantics..
> >>>> >> >>>>>
> >>>> >> >>>>>> It sounds like, from what Marek is saying, that gallium is
> >>>> >> >>>>>> currently doing
> >>>> >> >>>>>> (2) and changing it to (1) would be painful.  I think
> reference
> >>>> >> >>>>>> counting is
> >>>> >> >>>>>> more like an awkward option 1.5 than option 3.  Reference
> >>>> >> >>>>>> counting
> >>>> >> >>>>>> would
> >>>> >> >>>>>> mean that gallium passes a reference to the driver which it
> is
> >>>> >> >>>>>> expected to
> >>>> >> >>>>>> unref but may keep a second reference if it wants to keep
> the
> >>>> >> >>>>>> driver from
> >>>> >> >>>>>> modifying it.  Then the driver may or may not make a copy
> based
> >>>> >> >>>>>> on
> >>>> >> >>>>>> the
> >>>> >> >>>>>> number of references.  Why don't we just make it explicit
> and
> >>>> >> >>>>>> add a
> >>>> >> >>>>>> read-only bit and call it a day.
> >>>> >> >>>>>>
> >>>> >> >>>>>> One of the reasons I don't like passing a reference is that
> it
> >>>> >> >>>>>> effectively
> >>>> >> >>>>>> puts allocation and freeing in different components of the
> >>>> >> >>>>>> driver.
> >>>> >> >>>>>
> >>>> >> >>>>> With refcnt'ing you should talk in terms of ref/unref rather
> than
> >>>> >> >>>>> allocate/free.. imho.  Although maybe that is what you meant.
> >>>> >> >>>>> (In
> >>>> >> >>>>> which case, yes, that was my idea, that passing in to driver
> >>>> >> >>>>> transfers
> >>>> >> >>>>> ownership of the passed reference.)
> >>>> >> >>>>>
> >>>> >> >>>>>> This
> >>>> >> >>>>>> means that if and driver doesn't care at all about the
> shader
> >>>> >> >>>>>> that
> >>>> >> >>>>>> gets
> >>>> >> >>>>>> passed in, it still has to under it to avoid a memory
> leak.  You
> >>>> >> >>>>>> can't have
> >>>> >> >>>>>> the driver take the reference because then, either it comes
> in
> >>>> >> >>>>>> with
> >>>> >> >>>>>> a
> >>>> >> >>>>>> recount of 0 and should have been deleted, or the "can I
> modify
> >>>> >> >>>>>> this" check
> >>>> >> >>>>>> becomes "recount <= 2" which makes no sense.
> >>>> >> >>>>>
> >>>> >> >>>>> hmm, no, if ownership of the reference is transferred to the
> >>>> >> >>>>> driver,
> >>>> >> >>>>> then it becomes "refcount == 1" (and refcount == 0 should be
> an
> >>>> >> >>>>> assert, because something has gone horribly wrong)
> >>>> >> >>>>>
> >>>> >> >>>>> BR,
> >>>> >> >>>>> -R
> >>>> >
> >>>> >
> >>>
> >>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20151229/c5aa2777/attachment-0001.html>


More information about the mesa-dev mailing list