[Mesa-dev] [RFC] i965: alternative to memctx for cleaning up nir variants

Connor Abbott cwabbott0 at gmail.com
Mon Dec 21 09:09:58 PST 2015

On Mon, Dec 21, 2015 at 11:45 AM, Marek Olšák <maraeo at gmail.com> wrote:
> On Mon, Dec 21, 2015 at 4:38 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>> On Mon, Dec 21, 2015 at 6:39 AM, Marek Olšák <maraeo at gmail.com> wrote:
>>> On Mon, Dec 21, 2015 at 6:48 AM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>> On Dec 20, 2015 7:43 PM, "Rob Clark" <robdclark at gmail.com> wrote:
>>>>> On Sun, Dec 20, 2015 at 10:29 PM, Connor Abbott <cwabbott0 at gmail.com>
>>>>> wrote:
>>>>> > On Sun, Dec 20, 2015 at 10:04 PM, Rob Clark <robdclark at gmail.com> wrote:
>>>>> >> On Sun, Dec 20, 2015 at 9:12 PM, Jason Ekstrand <jason at jlekstrand.net>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> On Dec 19, 2015 5:55 PM, "Rob Clark" <robdclark at gmail.com> wrote:
>>>>> >>>>
>>>>> >>>> From: Rob Clark <robclark at freedesktop.org>
>>>>> >>>>
>>>>> >>>> Jason,
>>>>> >>>>
>>>>> >>>> How much do you hate this idea?  Seems like an easy alternative to
>>>>> >>>> using ralloc ctx's to clean up nir variants/clones, which would let
>>>>> >>>> us drop the parent memctx for nir_shader_create()/clone(), making
>>>>> >>>> it easier to introduce reference counting.
>>>>> >>>
>>>>> >>> I think "hate" is a but strong. I don't like it but it works.  If we
>>>>> >>> really
>>>>> >>> want nir_shader refcounted, we'll have to do something.
>>>>> >>
>>>>> >> I suppose the alternate idea of moving the nir_shader_clone() out of
>>>>> >> brw_compile_xyz(), and always passing in the clone would be a cleaner
>>>>> >> way.  It looks like each of the brw_compile_xyz() has exactly one
>>>>> >> call-site, so doing the nir_shader_clone() inside doesn't really buy
>>>>> >> anything.
>>>> Your forgetting that there may be *cough* other users of this API... We can
>>>> change those too but I would like the needs of the compiler users to drive
>>>> the API, not the cloning.  I still have some details to work out there. In
>>>> any case, it doesn't really matter; we can figure something out.
>>>>> >>> About refcounting... The more I think about it the more I'm not
>>>>> >>> convinced
>>>>> >>> it's useful.  As it stands, we have no use for it an I'm not convinced
>>>>> >>> you
>>>>> >>> do either.  We'll see if I can convince you. :-)
>>>>> >>>
>>>>> >>> In the history of i965 using NIR, we've had about three different ways
>>>>> >>> of
>>>>> >>> doing things:
>>>>> >>>
>>>>> >>> 1) GLSL is the gold copy and we run glsl_to_nir for every
>>>>> >>> shader/variant
>>>>> >>> compile.  This is what we did when we first stated using NIR because
>>>>> >>> it was
>>>>> >>> easy and didn't involve reworking any plumbing.
>>>>> >>>
>>>>> >>> 2) Lowered NIR is the gold copy; variants are done entirely in the
>>>>> >>> back-end
>>>>> >>> IR.  This is what we did up until about a month ago.  Because variants
>>>>> >>> are
>>>>> >>> done in the back-end, we can run gksl_to_nir and do all of our
>>>>> >>> optimizing
>>>>> >>> and lowering at link time.  Going from NIR to the final shader binary
>>>>> >>> is
>>>>> >>> then a read-only operation as far as NIR is concerned.
>>>>> >>>
>>>>> >>> 3) Optimized but not lowered NIR is the gold copy; variants are
>>>>> >>> sometimes
>>>>> >>> done in NIR.  This is the scheme we use now.  We call glsl_to_nir and
>>>>> >>> do
>>>>> >>> some of the optimization and lowering at link time but leave it in SSA
>>>>> >>> form.
>>>>> >>> When we go to compile the final shader, we make a copy, apply
>>>>> >>> variants, do
>>>>> >>> the final lowering and then go into the back-end IR.
>>>>> >>>
>>>>> >>> In each of these cases, we know exactly where we need to make a copy
>>>>> >>> without
>>>>> >>> the help of reference counting.  In the first, we get a fresh copy
>>>>> >>> each time
>>>>> >>> so we are free to destroy the copy.  In the second, we never have to
>>>>> >>> modify
>>>>> >>> the NIR so no copy.  In the third scheme, we always have to make a
>>>>> >>> copy
>>>>> >>> because, even if variants are a no-op, we still have to go out of SSA
>>>>> >>> form
>>>>> >>> and do final lowering.  You could say that we could avoid making that
>>>>> >>> copy.
>>>>> >>> However, the work to determine when we don't need variants and can do
>>>>> >>> all
>>>>> >>> our lowering up-front is far more than the work saved by reference
>>>>> >>> counting.
>>>>> >>>
>>>>> >>> How about gallium?  Here's how I imagine it would work (please correct
>>>>> >>> me of
>>>>> >>> I'm wrong):
>>>>> >>>
>>>>> >>> 1) In the TGSI case, tgsi_to_nir gets called for each compile so you
>>>>> >>> get a
>>>>> >>> fresh mutable shader each time.  In this case, you are free to do
>>>>> >>> whatever
>>>>> >>> you want with the shader without making a copy.
>>>>> >>>
>>>>> >>> 2) In the GLSL case, you run glsl_to_nir and do some basic
>>>>> >>> optimizations at
>>>>> >>> link time and hold onto the NIR shader. (Hold a reference of you'd
>>>>> >>> like.)
>>>>> >>> When you go to compile it in the back-end, it needs to do it's own
>>>>> >>> lowering
>>>>> >>> so it takes a reference and ends up making a copy.
>>>>> >>>
>>>>> >>> If this description is anywhere close to correct, then I don't think
>>>>> >>> you
>>>>> >>> really need it either.  Determining whether or not you need to copy is
>>>>> >>> simply "if (comes_from_tgsi)”.  Maybe there's something subtle about
>>>>> >>> the
>>>>> >>> gallium layer that I don't know that makes refcounting the best
>>>>> >>> solution.
>>>>> >>> Please enlighten me of there is.
>>>>> >>
>>>>> >> This issue is that we *potentially* have both the state tracker and
>>>>> >> the driver both doing some of there own variant management.  (Which
>>>>> >> tbh, isn't awesome, it would have been nice if someone realized
>>>>> >> earlier on that nearly every driver is going to have to do some sort
>>>>> >> of variant mgmt and figured out a way just to push it all down to the
>>>>> >> driver.. but I can't see a good way to get there from here.)
>>>>> >>
>>>>> >> With TGSI as the IR, driver just unconditionally does
>>>>> >> tgsi_dup_tokens().. because of the whole thing where st does variants
>>>>> >> in some cases, things are defined that driver doesn't own the copy of
>>>>> >> the TGSI IR passed in after the fxn call to driver returns.
>>>>> >>
>>>>> >> With NIR I was hoping to fix this, esp. since nir_shader_clone() is
>>>>> >> more heavyweight than tgsi_dup_tokens() (memcpy()).
>>>>> >>
>>>>> >> Refcnt'ing is a nice solution so that we can pass the driver a
>>>>> >> reference that it owns.  In cases where state tracker isn't doing
>>>>> >> variant mgmt, we pass it the one-and-only ref (avoiding clone).
>>>>> >>
>>>>> >> I'd suggested that in cases where st does variant mgmt, that st should
>>>>> >> do the clone/dup.  But that was shot down:
>>>>> >>
>>>>> >> http://lists.freedesktop.org/archives/mesa-dev/2015-October/097748.html
>>>> It sounds like Marek's argument there is more about lifetime management than
>>>> anything.  He wants gallium modules to be able to create IR, call into the
>>>> driver, and then throw it away.  In particular, he doesn't want them to have
>>>> to think about cloning. In a lot of ways it sounds a lot like what i965
>>>> wants too.  I really like having brw_compile_foo take a const nir_shader.
>>>> The difference is that i965 basically always wants to clone whereas a
>>>> gallium driver may not have to if gallium doesn't care what happens to the
>>>> shader when it's done.  How common is this case? How important is it to
>>>> optimize for?  I don't know.
>>>> One other thing that bothers me a bit:  From Marek's comment, it sounds like
>>>> the components want to just pass in IR and be agnostic about whether the
>>>> driver wants its own copy or wants to change it or whatever.  This seems
>>>> like an argument for always cloning to me.  From the perspective of a
>>>> gallium module, "I want to hang in to this, I'll keep a reference" seems
>>>> exactly the same as "I want to hang onto this, I'll give the driver a copy".
>>>> How are they actually different given that the driver basically has to
>>>> modify what you give it in order to do lowering?
>>>>> > Ugh... I didn't read this at the time, but I don't like Marek's
>>>>> > response. My understanding of the situation, based on this thread, is
>>>>> > that there are some cases where the st knows that there's only going
>>>>> > to be one variant and can throw away the (NIR or TGSI) shader after it
>>>>> > hands it to the driver, while at other times it has to hold onto all
>>>>> > the variants and only give the driver a read-only copy (or duplicate
>>>> As per above, my interpretation of Marek's comment is that he doesn't want
>>>> the st to have to think about cloning ever.  He wants it to assume that
>>>> compilation never modifies the IR so the driver should always clone.  You
>>>> have to keep in mind that Marek is most likely thinking about caching the
>>>> TGSI rather than doing in-place lowering in it.
>>>> If I'm understanding Marek correctly, then it sounds like shader compilation
>>>> should never touch the IR that's passed in.  If this is the case, it sounds
>>>> like always cloning is the way to go.  At least its not *that* expensive.
>>> Note that st/mesa needs to keep the FS IR because of glDrawPixels and
>>> glBitmap, and the VS IR because of edge flags, glRasterPos evaluation,
>>> selection and feedback modes. The last three are done with Draw/LLVM
>>> and only support TGSI.
>>> Therefore, st/mesa always hangs onto the IR and drivers can't modify
>>> it. It also needs VS in TGSI to be able to do everything correctly.
>>> What other Gallium modules want or not want is not that important, but
>>> changing the current semantics will require fixing a lot of places.
>>> (state trackers - mesa, nine, xa; modules - blitter, draw, hud,
>>> postprocess, tests, vl)
>>> You really better think about whether changing all those and the risk
>>> of breaking them is worth it.
>>> Marek
>> Well, we're talking about passing NIR here, not TGSI, so none of those
>> places will need to be updated. NIR is a much more heavyweight IR, and
>> copying is much more expensive, so it makes more sense there for the
>> st to duplicate it and let the driver own the IR it passes in, to
>> reduce copying when there's no variant management necessary.
> My main concern was about TGSI, not so much about NIR.

Yes, exactly -- we don't plan on changing the semantics of passing in
TGSI, so this only matters when the user passes in a NIR shader.

> If gallium drivers want to take ownership of the IR in
> create_xx_state, so be it, but they must be allowed to release it
> immediately if they don't want to keep it.
> Marek

More information about the mesa-dev mailing list