[Mesa-dev] [RFC] i965: alternative to memctx for cleaning up nir variants

Mon Dec 21 10:48:22 PST 2015

On Dec 21, 2015 9:09 AM, "Connor Abbott" <cwabbott0 at gmail.com> wrote:
>
> On Mon, Dec 21, 2015 at 11:45 AM, Marek Olšák <maraeo at gmail.com> wrote:
> > On Mon, Dec 21, 2015 at 4:38 PM, Connor Abbott <cwabbott0 at gmail.com>
wrote:
> >> On Mon, Dec 21, 2015 at 6:39 AM, Marek Olšák <maraeo at gmail.com> wrote:
> >>> On Mon, Dec 21, 2015 at 6:48 AM, Jason Ekstrand <jason at jlekstrand.net>
wrote:
> >>>>
> >>>> On Dec 20, 2015 7:43 PM, "Rob Clark" <robdclark at gmail.com> wrote:
> >>>>>
> >>>>> On Sun, Dec 20, 2015 at 10:29 PM, Connor Abbott <cwabbott0 at gmail.com
>
> >>>>> wrote:
> >>>>> > On Sun, Dec 20, 2015 at 10:04 PM, Rob Clark <robdclark at gmail.com>
wrote:
> >>>>> >> On Sun, Dec 20, 2015 at 9:12 PM, Jason Ekstrand <
jason at jlekstrand.net>
> >>>>> >> wrote:
> >>>>> >>>
> >>>>> >>> On Dec 19, 2015 5:55 PM, "Rob Clark" <robdclark at gmail.com>
wrote:
> >>>>> >>>>
> >>>>> >>>> From: Rob Clark <robclark at freedesktop.org>
> >>>>> >>>>
> >>>>> >>>> Jason,
> >>>>> >>>>
> >>>>> >>>> How much do you hate this idea?  Seems like an easy
alternative to
> >>>>> >>>> using ralloc ctx's to clean up nir variants/clones, which
would let
> >>>>> >>>> us drop the parent memctx for nir_shader_create()/clone(),
making
> >>>>> >>>> it easier to introduce reference counting.
> >>>>> >>>
> >>>>> >>> I think "hate" is a but strong. I don't like it but it works.
If we
> >>>>> >>> really
> >>>>> >>> want nir_shader refcounted, we'll have to do something.
> >>>>> >>
> >>>>> >> I suppose the alternate idea of moving the nir_shader_clone()
out of
> >>>>> >> brw_compile_xyz(), and always passing in the clone would be a
cleaner
> >>>>> >> way.  It looks like each of the brw_compile_xyz() has exactly one
> >>>>> >> call-site, so doing the nir_shader_clone() inside doesn't really
buy
> >>>>> >> anything.
> >>>>
> >>>> Your forgetting that there may be *cough* other users of this API...
We can
> >>>> change those too but I would like the needs of the compiler users to
drive
> >>>> the API, not the cloning.  I still have some details to work out
there. In
> >>>> any case, it doesn't really matter; we can figure something out.
> >>>>
> >>>>> >>> About refcounting... The more I think about it the more I'm not
> >>>>> >>> convinced
> >>>>> >>> it's useful.  As it stands, we have no use for it an I'm not
convinced
> >>>>> >>> you
> >>>>> >>> do either.  We'll see if I can convince you. :-)
> >>>>> >>>
> >>>>> >>> In the history of i965 using NIR, we've had about three
different ways
> >>>>> >>> of
> >>>>> >>> doing things:
> >>>>> >>>
> >>>>> >>> 1) GLSL is the gold copy and we run glsl_to_nir for every
> >>>>> >>> shader/variant
> >>>>> >>> compile.  This is what we did when we first stated using NIR
because
> >>>>> >>> it was
> >>>>> >>> easy and didn't involve reworking any plumbing.
> >>>>> >>>
> >>>>> >>> 2) Lowered NIR is the gold copy; variants are done entirely in
the
> >>>>> >>> back-end
> >>>>> >>> IR.  This is what we did up until about a month ago.  Because
variants
> >>>>> >>> are
> >>>>> >>> done in the back-end, we can run gksl_to_nir and do all of our
> >>>>> >>> optimizing
> >>>>> >>> and lowering at link time.  Going from NIR to the final shader
binary
> >>>>> >>> is
> >>>>> >>> then a read-only operation as far as NIR is concerned.
> >>>>> >>>
> >>>>> >>> 3) Optimized but not lowered NIR is the gold copy; variants are
> >>>>> >>> sometimes
> >>>>> >>> done in NIR.  This is the scheme we use now.  We call
glsl_to_nir and
> >>>>> >>> do
> >>>>> >>> some of the optimization and lowering at link time but leave it
in SSA
> >>>>> >>> form.
> >>>>> >>> When we go to compile the final shader, we make a copy, apply
> >>>>> >>> variants, do
> >>>>> >>> the final lowering and then go into the back-end IR.
> >>>>> >>>
> >>>>> >>> In each of these cases, we know exactly where we need to make a
copy
> >>>>> >>> without
> >>>>> >>> the help of reference counting.  In the first, we get a fresh
copy
> >>>>> >>> each time
> >>>>> >>> so we are free to destroy the copy.  In the second, we never
have to
> >>>>> >>> modify
> >>>>> >>> the NIR so no copy.  In the third scheme, we always have to
make a
> >>>>> >>> copy
> >>>>> >>> because, even if variants are a no-op, we still have to go out
of SSA
> >>>>> >>> form
> >>>>> >>> and do final lowering.  You could say that we could avoid
making that
> >>>>> >>> copy.
> >>>>> >>> However, the work to determine when we don't need variants and
can do
> >>>>> >>> all
> >>>>> >>> our lowering up-front is far more than the work saved by
reference
> >>>>> >>> counting.
> >>>>> >>>
> >>>>> >>> How about gallium?  Here's how I imagine it would work (please
correct
> >>>>> >>> me of
> >>>>> >>> I'm wrong):
> >>>>> >>>
> >>>>> >>> 1) In the TGSI case, tgsi_to_nir gets called for each compile
so you
> >>>>> >>> get a
> >>>>> >>> fresh mutable shader each time.  In this case, you are free to
do
> >>>>> >>> whatever
> >>>>> >>> you want with the shader without making a copy.
> >>>>> >>>
> >>>>> >>> 2) In the GLSL case, you run glsl_to_nir and do some basic
> >>>>> >>> optimizations at
> >>>>> >>> link time and hold onto the NIR shader. (Hold a reference of
you'd
> >>>>> >>> like.)
> >>>>> >>> When you go to compile it in the back-end, it needs to do it's
own
> >>>>> >>> lowering
> >>>>> >>> so it takes a reference and ends up making a copy.
> >>>>> >>>
> >>>>> >>> If this description is anywhere close to correct, then I don't
think
> >>>>> >>> you
> >>>>> >>> really need it either.  Determining whether or not you need to
copy is
> >>>>> >>> simply "if (comes_from_tgsi)”.  Maybe there's something subtle
about
> >>>>> >>> the
> >>>>> >>> gallium layer that I don't know that makes refcounting the best
> >>>>> >>> solution.
> >>>>> >>> Please enlighten me of there is.
> >>>>> >>
> >>>>> >> This issue is that we *potentially* have both the state tracker
and
> >>>>> >> the driver both doing some of there own variant management.
(Which
> >>>>> >> tbh, isn't awesome, it would have been nice if someone realized
> >>>>> >> earlier on that nearly every driver is going to have to do some
sort
> >>>>> >> of variant mgmt and figured out a way just to push it all down
to the
> >>>>> >> driver.. but I can't see a good way to get there from here.)
> >>>>> >>
> >>>>> >> With TGSI as the IR, driver just unconditionally does
> >>>>> >> tgsi_dup_tokens().. because of the whole thing where st does
variants
> >>>>> >> in some cases, things are defined that driver doesn't own the
copy of
> >>>>> >> the TGSI IR passed in after the fxn call to driver returns.
> >>>>> >>
> >>>>> >> With NIR I was hoping to fix this, esp. since nir_shader_clone()
is
> >>>>> >> more heavyweight than tgsi_dup_tokens() (memcpy()).
> >>>>> >>
> >>>>> >> Refcnt'ing is a nice solution so that we can pass the driver a
> >>>>> >> reference that it owns.  In cases where state tracker isn't doing
> >>>>> >> variant mgmt, we pass it the one-and-only ref (avoiding clone).
> >>>>> >>
> >>>>> >> I'd suggested that in cases where st does variant mgmt, that st
should
> >>>>> >> do the clone/dup.  But that was shot down:
> >>>>> >>
> >>>>> >>
http://lists.freedesktop.org/archives/mesa-dev/2015-October/097748.html
> >>>>
> >>>> It sounds like Marek's argument there is more about lifetime
management than
> >>>> anything.  He wants gallium modules to be able to create IR, call
into the
> >>>> driver, and then throw it away.  In particular, he doesn't want them
to have
> >>>> to think about cloning. In a lot of ways it sounds a lot like what
i965
> >>>> wants too.  I really like having brw_compile_foo take a const
nir_shader.
> >>>> The difference is that i965 basically always wants to clone whereas a
> >>>> gallium driver may not have to if gallium doesn't care what happens
to the
> >>>> shader when it's done.  How common is this case? How important is it
to
> >>>> optimize for?  I don't know.
> >>>>
> >>>> One other thing that bothers me a bit:  From Marek's comment, it
sounds like
> >>>> the components want to just pass in IR and be agnostic about whether
the
> >>>> driver wants its own copy or wants to change it or whatever.  This
seems
> >>>> like an argument for always cloning to me.  From the perspective of a
> >>>> gallium module, "I want to hang in to this, I'll keep a reference"
seems
> >>>> exactly the same as "I want to hang onto this, I'll give the driver
a copy".
> >>>> How are they actually different given that the driver basically has
to
> >>>> modify what you give it in order to do lowering?
> >>>>
> >>>>> > Ugh... I didn't read this at the time, but I don't like Marek's
> >>>>> > response. My understanding of the situation, based on this
thread, is
> >>>>> > that there are some cases where the st knows that there's only
going
> >>>>> > to be one variant and can throw away the (NIR or TGSI) shader
after it
> >>>>> > hands it to the driver, while at other times it has to hold onto
all
> >>>>> > the variants and only give the driver a read-only copy (or
duplicate
> >>>>
> >>>> As per above, my interpretation of Marek's comment is that he
doesn't want
> >>>> the st to have to think about cloning ever.  He wants it to assume
that
> >>>> compilation never modifies the IR so the driver should always
clone.  You
> >>>> have to keep in mind that Marek is most likely thinking about
caching the
> >>>> TGSI rather than doing in-place lowering in it.
> >>>>
> >>>> If I'm understanding Marek correctly, then it sounds like shader
compilation
> >>>> should never touch the IR that's passed in.  If this is the case, it
sounds
> >>>> like always cloning is the way to go.  At least its not *that*
expensive.
> >>>
> >>> Note that st/mesa needs to keep the FS IR because of glDrawPixels and
> >>> glBitmap, and the VS IR because of edge flags, glRasterPos evaluation,
> >>> selection and feedback modes. The last three are done with Draw/LLVM
> >>> and only support TGSI.
> >>>
> >>> Therefore, st/mesa always hangs onto the IR and drivers can't modify
> >>> it. It also needs VS in TGSI to be able to do everything correctly.
> >>>
> >>> What other Gallium modules want or not want is not that important, but
> >>> changing the current semantics will require fixing a lot of places.
> >>> (state trackers - mesa, nine, xa; modules - blitter, draw, hud,
> >>> postprocess, tests, vl)
> >>>
> >>> You really better think about whether changing all those and the risk
> >>> of breaking them is worth it.
> >>>
> >>> Marek
> >>
> >> Well, we're talking about passing NIR here, not TGSI, so none of those
> >> places will need to be updated. NIR is a much more heavyweight IR, and
> >> copying is much more expensive, so it makes more sense there for the
> >> st to duplicate it and let the driver own the IR it passes in, to
> >> reduce copying when there's no variant management necessary.
> >
> > My main concern was about TGSI, not so much about NIR.
>
> Yes, exactly -- we don't plan on changing the semantics of passing in
> TGSI, so this only matters when the user passes in a NIR shader.

I think two different concepts of ownership are getting conflated here:
Right/responsibility to delete and right to modify.

The way I understand it, gallium, as it stands, gives neither to the
driver.  A back-end using NIR requires the right to modify but who deletes
it doesn't ultimately matter.  I think it's dangerous to pass one of these
rights to the driver and not the other but we need to think about both.

What I'm trying to say is that we have two options here:

1) gallium passes IR to the back-end that it is free to modify and is
required to delete when it's done.

2) gallium passes read-only IR to the back-end and it always makes a copy.

It sounds like, from what Marek is saying, that gallium is currently doing
(2) and changing it to (1) would be painful.  I think reference counting is
more like an awkward option 1.5 than option 3.  Reference counting would
mean that gallium passes a reference to the driver which it is expected to
unref but may keep a second reference if it wants to keep the driver from
modifying it.  Then the driver may or may not make a copy based on the
number of references.  Why don't we just make it explicit and add a
read-only bit and call it a day.

One of the reasons I don't like passing a reference is that it effectively
puts allocation and freeing in different components of the driver.  This
means that if and driver doesn't care at all about the shader that gets
passed in, it still has to under it to avoid a memory leak.  You can't have
the driver take the reference because then, either it comes in with a
recount of 0 and should have been deleted, or the "can I modify this" check
becomes "recount <= 2" which makes no sense.

> >
> > If gallium drivers want to take ownership of the IR in
> > create_xx_state, so be it, but they must be allowed to release it
> > immediately if they don't want to keep it.
> >
> > Marek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20151221/d0208b4c/attachment-0001.html>