[Mesa-dev] [RFC] i965: alternative to memctx for cleaning up nir variants

Mon Dec 21 18:20:49 PST 2015

On Mon, Dec 21, 2015 at 2:24 PM, Marek Olšák <maraeo at gmail.com> wrote:

> On Mon, Dec 21, 2015 at 11:20 PM, Jason Ekstrand <jason at jlekstrand.net>
> wrote:
> >
> > On Dec 21, 2015 1:37 PM, "Marek Olšák" <maraeo at gmail.com> wrote:
> >>
> >> On Mon, Dec 21, 2015 at 7:48 PM, Jason Ekstrand <jason at jlekstrand.net>
> >> wrote:
> >> >
> >> > On Dec 21, 2015 9:09 AM, "Connor Abbott" <cwabbott0 at gmail.com> wrote:
> >> >>
> >> >> On Mon, Dec 21, 2015 at 11:45 AM, Marek Olšák <maraeo at gmail.com>
> wrote:
> >> >> > On Mon, Dec 21, 2015 at 4:38 PM, Connor Abbott <
> cwabbott0 at gmail.com>
> >> >> > wrote:
> >> >> >> On Mon, Dec 21, 2015 at 6:39 AM, Marek Olšák <maraeo at gmail.com>
> >> >> >> wrote:
> >> >> >>> On Mon, Dec 21, 2015 at 6:48 AM, Jason Ekstrand
> >> >> >>> <jason at jlekstrand.net>
> >> >> >>> wrote:
> >> >> >>>>
> >> >> >>>> On Dec 20, 2015 7:43 PM, "Rob Clark" <robdclark at gmail.com>
> wrote:
> >> >> >>>>>
> >> >> >>>>> On Sun, Dec 20, 2015 at 10:29 PM, Connor Abbott
> >> >> >>>>> <cwabbott0 at gmail.com>
> >> >> >>>>> wrote:
> >> >> >>>>> > On Sun, Dec 20, 2015 at 10:04 PM, Rob Clark
> >> >> >>>>> > <robdclark at gmail.com>
> >> >> >>>>> > wrote:
> >> >> >>>>> >> On Sun, Dec 20, 2015 at 9:12 PM, Jason Ekstrand
> >> >> >>>>> >> <jason at jlekstrand.net>
> >> >> >>>>> >> wrote:
> >> >> >>>>> >>>
> >> >> >>>>> >>> On Dec 19, 2015 5:55 PM, "Rob Clark" <robdclark at gmail.com>
> >> >> >>>>> >>> wrote:
> >> >> >>>>> >>>>
> >> >> >>>>> >>>> From: Rob Clark <robclark at freedesktop.org>
> >> >> >>>>> >>>>
> >> >> >>>>> >>>> Jason,
> >> >> >>>>> >>>>
> >> >> >>>>> >>>> How much do you hate this idea?  Seems like an easy
> >> >> >>>>> >>>> alternative
> >> >> >>>>> >>>> to
> >> >> >>>>> >>>> using ralloc ctx's to clean up nir variants/clones, which
> >> >> >>>>> >>>> would
> >> >> >>>>> >>>> let
> >> >> >>>>> >>>> us drop the parent memctx for nir_shader_create()/clone(),
> >> >> >>>>> >>>> making
> >> >> >>>>> >>>> it easier to introduce reference counting.
> >> >> >>>>> >>>
> >> >> >>>>> >>> I think "hate" is a but strong. I don't like it but it
> works.
> >> >> >>>>> >>> If we
> >> >> >>>>> >>> really
> >> >> >>>>> >>> want nir_shader refcounted, we'll have to do something.
> >> >> >>>>> >>
> >> >> >>>>> >> I suppose the alternate idea of moving the
> nir_shader_clone()
> >> >> >>>>> >> out
> >> >> >>>>> >> of
> >> >> >>>>> >> brw_compile_xyz(), and always passing in the clone would be
> a
> >> >> >>>>> >> cleaner
> >> >> >>>>> >> way.  It looks like each of the brw_compile_xyz() has
> exactly
> >> >> >>>>> >> one
> >> >> >>>>> >> call-site, so doing the nir_shader_clone() inside doesn't
> >> >> >>>>> >> really
> >> >> >>>>> >> buy
> >> >> >>>>> >> anything.
> >> >> >>>>
> >> >> >>>> Your forgetting that there may be *cough* other users of this
> >> >> >>>> API...
> >> >> >>>> We can
> >> >> >>>> change those too but I would like the needs of the compiler
> users
> >> >> >>>> to
> >> >> >>>> drive
> >> >> >>>> the API, not the cloning.  I still have some details to work out
> >> >> >>>> there. In
> >> >> >>>> any case, it doesn't really matter; we can figure something out.
> >> >> >>>>
> >> >> >>>>> >>> About refcounting... The more I think about it the more I'm
> >> >> >>>>> >>> not
> >> >> >>>>> >>> convinced
> >> >> >>>>> >>> it's useful.  As it stands, we have no use for it an I'm
> not
> >> >> >>>>> >>> convinced
> >> >> >>>>> >>> you
> >> >> >>>>> >>> do either.  We'll see if I can convince you. :-)
> >> >> >>>>> >>>
> >> >> >>>>> >>> In the history of i965 using NIR, we've had about three
> >> >> >>>>> >>> different ways
> >> >> >>>>> >>> of
> >> >> >>>>> >>> doing things:
> >> >> >>>>> >>>
> >> >> >>>>> >>> 1) GLSL is the gold copy and we run glsl_to_nir for every
> >> >> >>>>> >>> shader/variant
> >> >> >>>>> >>> compile.  This is what we did when we first stated using
> NIR
> >> >> >>>>> >>> because
> >> >> >>>>> >>> it was
> >> >> >>>>> >>> easy and didn't involve reworking any plumbing.
> >> >> >>>>> >>>
> >> >> >>>>> >>> 2) Lowered NIR is the gold copy; variants are done entirely
> >> >> >>>>> >>> in
> >> >> >>>>> >>> the
> >> >> >>>>> >>> back-end
> >> >> >>>>> >>> IR.  This is what we did up until about a month ago.
> Because
> >> >> >>>>> >>> variants
> >> >> >>>>> >>> are
> >> >> >>>>> >>> done in the back-end, we can run gksl_to_nir and do all of
> >> >> >>>>> >>> our
> >> >> >>>>> >>> optimizing
> >> >> >>>>> >>> and lowering at link time.  Going from NIR to the final
> >> >> >>>>> >>> shader
> >> >> >>>>> >>> binary
> >> >> >>>>> >>> is
> >> >> >>>>> >>> then a read-only operation as far as NIR is concerned.
> >> >> >>>>> >>>
> >> >> >>>>> >>> 3) Optimized but not lowered NIR is the gold copy; variants
> >> >> >>>>> >>> are
> >> >> >>>>> >>> sometimes
> >> >> >>>>> >>> done in NIR.  This is the scheme we use now.  We call
> >> >> >>>>> >>> glsl_to_nir and
> >> >> >>>>> >>> do
> >> >> >>>>> >>> some of the optimization and lowering at link time but
> leave
> >> >> >>>>> >>> it
> >> >> >>>>> >>> in SSA
> >> >> >>>>> >>> form.
> >> >> >>>>> >>> When we go to compile the final shader, we make a copy,
> apply
> >> >> >>>>> >>> variants, do
> >> >> >>>>> >>> the final lowering and then go into the back-end IR.
> >> >> >>>>> >>>
> >> >> >>>>> >>> In each of these cases, we know exactly where we need to
> make
> >> >> >>>>> >>> a
> >> >> >>>>> >>> copy
> >> >> >>>>> >>> without
> >> >> >>>>> >>> the help of reference counting.  In the first, we get a
> fresh
> >> >> >>>>> >>> copy
> >> >> >>>>> >>> each time
> >> >> >>>>> >>> so we are free to destroy the copy.  In the second, we
> never
> >> >> >>>>> >>> have to
> >> >> >>>>> >>> modify
> >> >> >>>>> >>> the NIR so no copy.  In the third scheme, we always have to
> >> >> >>>>> >>> make
> >> >> >>>>> >>> a
> >> >> >>>>> >>> copy
> >> >> >>>>> >>> because, even if variants are a no-op, we still have to go
> >> >> >>>>> >>> out
> >> >> >>>>> >>> of SSA
> >> >> >>>>> >>> form
> >> >> >>>>> >>> and do final lowering.  You could say that we could avoid
> >> >> >>>>> >>> making
> >> >> >>>>> >>> that
> >> >> >>>>> >>> copy.
> >> >> >>>>> >>> However, the work to determine when we don't need variants
> >> >> >>>>> >>> and
> >> >> >>>>> >>> can do
> >> >> >>>>> >>> all
> >> >> >>>>> >>> our lowering up-front is far more than the work saved by
> >> >> >>>>> >>> reference
> >> >> >>>>> >>> counting.
> >> >> >>>>> >>>
> >> >> >>>>> >>> How about gallium?  Here's how I imagine it would work
> >> >> >>>>> >>> (please
> >> >> >>>>> >>> correct
> >> >> >>>>> >>> me of
> >> >> >>>>> >>> I'm wrong):
> >> >> >>>>> >>>
> >> >> >>>>> >>> 1) In the TGSI case, tgsi_to_nir gets called for each
> compile
> >> >> >>>>> >>> so
> >> >> >>>>> >>> you
> >> >> >>>>> >>> get a
> >> >> >>>>> >>> fresh mutable shader each time.  In this case, you are free
> >> >> >>>>> >>> to
> >> >> >>>>> >>> do
> >> >> >>>>> >>> whatever
> >> >> >>>>> >>> you want with the shader without making a copy.
> >> >> >>>>> >>>
> >> >> >>>>> >>> 2) In the GLSL case, you run glsl_to_nir and do some basic
> >> >> >>>>> >>> optimizations at
> >> >> >>>>> >>> link time and hold onto the NIR shader. (Hold a reference
> of
> >> >> >>>>> >>> you'd
> >> >> >>>>> >>> like.)
> >> >> >>>>> >>> When you go to compile it in the back-end, it needs to do
> >> >> >>>>> >>> it's
> >> >> >>>>> >>> own
> >> >> >>>>> >>> lowering
> >> >> >>>>> >>> so it takes a reference and ends up making a copy.
> >> >> >>>>> >>>
> >> >> >>>>> >>> If this description is anywhere close to correct, then I
> >> >> >>>>> >>> don't
> >> >> >>>>> >>> think
> >> >> >>>>> >>> you
> >> >> >>>>> >>> really need it either.  Determining whether or not you need
> >> >> >>>>> >>> to
> >> >> >>>>> >>> copy is
> >> >> >>>>> >>> simply "if (comes_from_tgsi)”.  Maybe there's something
> >> >> >>>>> >>> subtle
> >> >> >>>>> >>> about
> >> >> >>>>> >>> the
> >> >> >>>>> >>> gallium layer that I don't know that makes refcounting the
> >> >> >>>>> >>> best
> >> >> >>>>> >>> solution.
> >> >> >>>>> >>> Please enlighten me of there is.
> >> >> >>>>> >>
> >> >> >>>>> >> This issue is that we *potentially* have both the state
> >> >> >>>>> >> tracker
> >> >> >>>>> >> and
> >> >> >>>>> >> the driver both doing some of there own variant management.
> >> >> >>>>> >> (Which
> >> >> >>>>> >> tbh, isn't awesome, it would have been nice if someone
> >> >> >>>>> >> realized
> >> >> >>>>> >> earlier on that nearly every driver is going to have to do
> >> >> >>>>> >> some
> >> >> >>>>> >> sort
> >> >> >>>>> >> of variant mgmt and figured out a way just to push it all
> down
> >> >> >>>>> >> to
> >> >> >>>>> >> the
> >> >> >>>>> >> driver.. but I can't see a good way to get there from here.)
> >> >> >>>>> >>
> >> >> >>>>> >> With TGSI as the IR, driver just unconditionally does
> >> >> >>>>> >> tgsi_dup_tokens().. because of the whole thing where st does
> >> >> >>>>> >> variants
> >> >> >>>>> >> in some cases, things are defined that driver doesn't own
> the
> >> >> >>>>> >> copy of
> >> >> >>>>> >> the TGSI IR passed in after the fxn call to driver returns.
> >> >> >>>>> >>
> >> >> >>>>> >> With NIR I was hoping to fix this, esp. since
> >> >> >>>>> >> nir_shader_clone()
> >> >> >>>>> >> is
> >> >> >>>>> >> more heavyweight than tgsi_dup_tokens() (memcpy()).
> >> >> >>>>> >>
> >> >> >>>>> >> Refcnt'ing is a nice solution so that we can pass the
> driver a
> >> >> >>>>> >> reference that it owns.  In cases where state tracker isn't
> >> >> >>>>> >> doing
> >> >> >>>>> >> variant mgmt, we pass it the one-and-only ref (avoiding
> >> >> >>>>> >> clone).
> >> >> >>>>> >>
> >> >> >>>>> >> I'd suggested that in cases where st does variant mgmt, that
> >> >> >>>>> >> st
> >> >> >>>>> >> should
> >> >> >>>>> >> do the clone/dup.  But that was shot down:
> >> >> >>>>> >>
> >> >> >>>>> >>
> >> >> >>>>> >>
> >> >> >>>>> >>
> http://lists.freedesktop.org/archives/mesa-dev/2015-October/097748.html
> >> >> >>>>
> >> >> >>>> It sounds like Marek's argument there is more about lifetime
> >> >> >>>> management than
> >> >> >>>> anything.  He wants gallium modules to be able to create IR,
> call
> >> >> >>>> into the
> >> >> >>>> driver, and then throw it away.  In particular, he doesn't want
> >> >> >>>> them
> >> >> >>>> to have
> >> >> >>>> to think about cloning. In a lot of ways it sounds a lot like
> what
> >> >> >>>> i965
> >> >> >>>> wants too.  I really like having brw_compile_foo take a const
> >> >> >>>> nir_shader.
> >> >> >>>> The difference is that i965 basically always wants to clone
> >> >> >>>> whereas a
> >> >> >>>> gallium driver may not have to if gallium doesn't care what
> >> >> >>>> happens
> >> >> >>>> to the
> >> >> >>>> shader when it's done.  How common is this case? How important
> is
> >> >> >>>> it
> >> >> >>>> to
> >> >> >>>> optimize for?  I don't know.
> >> >> >>>>
> >> >> >>>> One other thing that bothers me a bit:  From Marek's comment, it
> >> >> >>>> sounds like
> >> >> >>>> the components want to just pass in IR and be agnostic about
> >> >> >>>> whether
> >> >> >>>> the
> >> >> >>>> driver wants its own copy or wants to change it or whatever.
> This
> >> >> >>>> seems
> >> >> >>>> like an argument for always cloning to me.  From the perspective
> >> >> >>>> of a
> >> >> >>>> gallium module, "I want to hang in to this, I'll keep a
> reference"
> >> >> >>>> seems
> >> >> >>>> exactly the same as "I want to hang onto this, I'll give the
> >> >> >>>> driver a
> >> >> >>>> copy".
> >> >> >>>> How are they actually different given that the driver basically
> >> >> >>>> has
> >> >> >>>> to
> >> >> >>>> modify what you give it in order to do lowering?
> >> >> >>>>
> >> >> >>>>> > Ugh... I didn't read this at the time, but I don't like
> Marek's
> >> >> >>>>> > response. My understanding of the situation, based on this
> >> >> >>>>> > thread,
> >> >> >>>>> > is
> >> >> >>>>> > that there are some cases where the st knows that there's
> only
> >> >> >>>>> > going
> >> >> >>>>> > to be one variant and can throw away the (NIR or TGSI) shader
> >> >> >>>>> > after it
> >> >> >>>>> > hands it to the driver, while at other times it has to hold
> >> >> >>>>> > onto
> >> >> >>>>> > all
> >> >> >>>>> > the variants and only give the driver a read-only copy (or
> >> >> >>>>> > duplicate
> >> >> >>>>
> >> >> >>>> As per above, my interpretation of Marek's comment is that he
> >> >> >>>> doesn't
> >> >> >>>> want
> >> >> >>>> the st to have to think about cloning ever.  He wants it to
> assume
> >> >> >>>> that
> >> >> >>>> compilation never modifies the IR so the driver should always
> >> >> >>>> clone.
> >> >> >>>> You
> >> >> >>>> have to keep in mind that Marek is most likely thinking about
> >> >> >>>> caching
> >> >> >>>> the
> >> >> >>>> TGSI rather than doing in-place lowering in it.
> >> >> >>>>
> >> >> >>>> If I'm understanding Marek correctly, then it sounds like shader
> >> >> >>>> compilation
> >> >> >>>> should never touch the IR that's passed in.  If this is the
> case,
> >> >> >>>> it
> >> >> >>>> sounds
> >> >> >>>> like always cloning is the way to go.  At least its not *that*
> >> >> >>>> expensive.
> >> >> >>>
> >> >> >>> Note that st/mesa needs to keep the FS IR because of glDrawPixels
> >> >> >>> and
> >> >> >>> glBitmap, and the VS IR because of edge flags, glRasterPos
> >> >> >>> evaluation,
> >> >> >>> selection and feedback modes. The last three are done with
> >> >> >>> Draw/LLVM
> >> >> >>> and only support TGSI.
> >> >> >>>
> >> >> >>> Therefore, st/mesa always hangs onto the IR and drivers can't
> >> >> >>> modify
> >> >> >>> it. It also needs VS in TGSI to be able to do everything
> correctly.
> >> >> >>>
> >> >> >>> What other Gallium modules want or not want is not that
> important,
> >> >> >>> but
> >> >> >>> changing the current semantics will require fixing a lot of
> places.
> >> >> >>> (state trackers - mesa, nine, xa; modules - blitter, draw, hud,
> >> >> >>> postprocess, tests, vl)
> >> >> >>>
> >> >> >>> You really better think about whether changing all those and the
> >> >> >>> risk
> >> >> >>> of breaking them is worth it.
> >> >> >>>
> >> >> >>> Marek
> >> >> >>
> >> >> >> Well, we're talking about passing NIR here, not TGSI, so none of
> >> >> >> those
> >> >> >> places will need to be updated. NIR is a much more heavyweight IR,
> >> >> >> and
> >> >> >> copying is much more expensive, so it makes more sense there for
> the
> >> >> >> st to duplicate it and let the driver own the IR it passes in, to
> >> >> >> reduce copying when there's no variant management necessary.
> >> >> >
> >> >> > My main concern was about TGSI, not so much about NIR.
> >> >>
> >> >> Yes, exactly -- we don't plan on changing the semantics of passing in
> >> >> TGSI, so this only matters when the user passes in a NIR shader.
> >> >
> >> > I think two different concepts of ownership are getting conflated
> here:
> >> > Right/responsibility to delete and right to modify.
> >> >
> >> > The way I understand it, gallium, as it stands, gives neither to the
> >> > driver.
> >> > A back-end using NIR requires the right to modify but who deletes it
> >> > doesn't
> >> > ultimately matter.  I think it's dangerous to pass one of these rights
> >> > to
> >> > the driver and not the other but we need to think about both.
> >> >
> >> > What I'm trying to say is that we have two options here:
> >> >
> >> > 1) gallium passes IR to the back-end that it is free to modify and is
> >> > required to delete when it's done.
> >> >
> >> > 2) gallium passes read-only IR to the back-end and it always makes a
> >> > copy.
> >>
> >> Not always. The copy is optional. Drivers are encouraged not to make a
> >> copy if they don't need it. Or they can keep a copy in a different IR,
> >> which is the same scenario for the original IR.
> >
> > Let's suppose that the driver always has to modify the IR before it can
> be
> > used. Does it have to make a copy then?  I know that may not be the case
> for
> > TGSI but it is currently the case for NIR.
>
> If the driver wants to modify TGSI, it does have to make a copy
> (tgsi_transform_shader automatically makes a copy). Regarding NIR,
> define and use whatever makes sense for it.
>

Ok, that's what I thought.  I'm just trying to make sure I get it all
clear.  I'm not 100% sure what the right thing to do for NIR is, but
knowing what's required for TGSI helps.  Thanks!
--Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20151221/e5ca129e/attachment-0001.html>