[Mesa-dev] [RFC] i965: alternative to memctx for cleaning up nir variants

Jason Ekstrand jason at jlekstrand.net
Sun Dec 20 21:48:28 PST 2015


On Dec 20, 2015 7:43 PM, "Rob Clark" <robdclark at gmail.com> wrote:
>
> On Sun, Dec 20, 2015 at 10:29 PM, Connor Abbott <cwabbott0 at gmail.com>
wrote:
> > On Sun, Dec 20, 2015 at 10:04 PM, Rob Clark <robdclark at gmail.com> wrote:
> >> On Sun, Dec 20, 2015 at 9:12 PM, Jason Ekstrand <jason at jlekstrand.net>
wrote:
> >>>
> >>> On Dec 19, 2015 5:55 PM, "Rob Clark" <robdclark at gmail.com> wrote:
> >>>>
> >>>> From: Rob Clark <robclark at freedesktop.org>
> >>>>
> >>>> Jason,
> >>>>
> >>>> How much do you hate this idea?  Seems like an easy alternative to
> >>>> using ralloc ctx's to clean up nir variants/clones, which would let
> >>>> us drop the parent memctx for nir_shader_create()/clone(), making
> >>>> it easier to introduce reference counting.
> >>>
> >>> I think "hate" is a but strong. I don't like it but it works.  If we
really
> >>> want nir_shader refcounted, we'll have to do something.
> >>
> >> I suppose the alternate idea of moving the nir_shader_clone() out of
> >> brw_compile_xyz(), and always passing in the clone would be a cleaner
> >> way.  It looks like each of the brw_compile_xyz() has exactly one
> >> call-site, so doing the nir_shader_clone() inside doesn't really buy
> >> anything.

Your forgetting that there may be *cough* other users of this API... We can
change those too but I would like the needs of the compiler users to drive
the API, not the cloning.  I still have some details to work out there. In
any case, it doesn't really matter; we can figure something out.

> >>> About refcounting... The more I think about it the more I'm not
convinced
> >>> it's useful.  As it stands, we have no use for it an I'm not
convinced you
> >>> do either.  We'll see if I can convince you. :-)
> >>>
> >>> In the history of i965 using NIR, we've had about three different
ways of
> >>> doing things:
> >>>
> >>> 1) GLSL is the gold copy and we run glsl_to_nir for every
shader/variant
> >>> compile.  This is what we did when we first stated using NIR because
it was
> >>> easy and didn't involve reworking any plumbing.
> >>>
> >>> 2) Lowered NIR is the gold copy; variants are done entirely in the
back-end
> >>> IR.  This is what we did up until about a month ago.  Because
variants are
> >>> done in the back-end, we can run gksl_to_nir and do all of our
optimizing
> >>> and lowering at link time.  Going from NIR to the final shader binary
is
> >>> then a read-only operation as far as NIR is concerned.
> >>>
> >>> 3) Optimized but not lowered NIR is the gold copy; variants are
sometimes
> >>> done in NIR.  This is the scheme we use now.  We call glsl_to_nir and
do
> >>> some of the optimization and lowering at link time but leave it in
SSA form.
> >>> When we go to compile the final shader, we make a copy, apply
variants, do
> >>> the final lowering and then go into the back-end IR.
> >>>
> >>> In each of these cases, we know exactly where we need to make a copy
without
> >>> the help of reference counting.  In the first, we get a fresh copy
each time
> >>> so we are free to destroy the copy.  In the second, we never have to
modify
> >>> the NIR so no copy.  In the third scheme, we always have to make a
copy
> >>> because, even if variants are a no-op, we still have to go out of SSA
form
> >>> and do final lowering.  You could say that we could avoid making that
copy.
> >>> However, the work to determine when we don't need variants and can do
all
> >>> our lowering up-front is far more than the work saved by reference
counting.
> >>>
> >>> How about gallium?  Here's how I imagine it would work (please
correct me of
> >>> I'm wrong):
> >>>
> >>> 1) In the TGSI case, tgsi_to_nir gets called for each compile so you
get a
> >>> fresh mutable shader each time.  In this case, you are free to do
whatever
> >>> you want with the shader without making a copy.
> >>>
> >>> 2) In the GLSL case, you run glsl_to_nir and do some basic
optimizations at
> >>> link time and hold onto the NIR shader. (Hold a reference of you'd
like.)
> >>> When you go to compile it in the back-end, it needs to do it's own
lowering
> >>> so it takes a reference and ends up making a copy.
> >>>
> >>> If this description is anywhere close to correct, then I don't think
you
> >>> really need it either.  Determining whether or not you need to copy is
> >>> simply "if (comes_from_tgsi)”.  Maybe there's something subtle about
the
> >>> gallium layer that I don't know that makes refcounting the best
solution.
> >>> Please enlighten me of there is.
> >>
> >> This issue is that we *potentially* have both the state tracker and
> >> the driver both doing some of there own variant management.  (Which
> >> tbh, isn't awesome, it would have been nice if someone realized
> >> earlier on that nearly every driver is going to have to do some sort
> >> of variant mgmt and figured out a way just to push it all down to the
> >> driver.. but I can't see a good way to get there from here.)
> >>
> >> With TGSI as the IR, driver just unconditionally does
> >> tgsi_dup_tokens().. because of the whole thing where st does variants
> >> in some cases, things are defined that driver doesn't own the copy of
> >> the TGSI IR passed in after the fxn call to driver returns.
> >>
> >> With NIR I was hoping to fix this, esp. since nir_shader_clone() is
> >> more heavyweight than tgsi_dup_tokens() (memcpy()).
> >>
> >> Refcnt'ing is a nice solution so that we can pass the driver a
> >> reference that it owns.  In cases where state tracker isn't doing
> >> variant mgmt, we pass it the one-and-only ref (avoiding clone).
> >>
> >> I'd suggested that in cases where st does variant mgmt, that st should
> >> do the clone/dup.  But that was shot down:
> >>
> >> http://lists.freedesktop.org/archives/mesa-dev/2015-October/097748.html

It sounds like Marek's argument there is more about lifetime management
than anything.  He wants gallium modules to be able to create IR, call into
the driver, and then throw it away.  In particular, he doesn't want them to
have to think about cloning. In a lot of ways it sounds a lot like what
i965 wants too.  I really like having brw_compile_foo take a const
nir_shader.  The difference is that i965 basically always wants to clone
whereas a gallium driver may not have to if gallium doesn't care what
happens to the shader when it's done.  How common is this case? How
important is it to optimize for?  I don't know.

One other thing that bothers me a bit:  From Marek's comment, it sounds
like the components want to just pass in IR and be agnostic about whether
the driver wants its own copy or wants to change it or whatever.  This
seems like an argument for always cloning to me.  From the perspective of a
gallium module, "I want to hang in to this, I'll keep a reference" seems
exactly the same as "I want to hang onto this, I'll give the driver a
copy".  How are they actually different given that the driver basically has
to modify what you give it in order to do lowering?

> > Ugh... I didn't read this at the time, but I don't like Marek's
> > response. My understanding of the situation, based on this thread, is
> > that there are some cases where the st knows that there's only going
> > to be one variant and can throw away the (NIR or TGSI) shader after it
> > hands it to the driver, while at other times it has to hold onto all
> > the variants and only give the driver a read-only copy (or duplicate

As per above, my interpretation of Marek's comment is that he doesn't want
the st to have to think about cloning ever.  He wants it to assume that
compilation never modifies the IR so the driver should always clone.  You
have to keep in mind that Marek is most likely thinking about caching the
TGSI rather than doing in-place lowering in it.

If I'm understanding Marek correctly, then it sounds like shader
compilation should never touch the IR that's passed in.  If this is the
case, it sounds like always cloning is the way to go.  At least its not
*that* expensive.

> > the shader). In that case, forcing the IR to implement reference
> > counting seems like much more "cruft" than duplicating the shader in
> > the st. What's the difference between duplicating the shader and
> > incrementing the reference count before passing it to the driver? It
> > seems like it's almost exactly the same amount of code, and reference
> > counting seems like overkill for a situation where there are only ever
> > going to be 1 or 2 references to the shader.
> >
> > I remember there was some cap which would enable the state tracker to
> > stop making variants, in which case this whole thing is moot, but
> > until then it doesn't seem necessary to force reference counting into
> > NIR.
> >
>
> So, I'm actually just digging thru this now, but I think it is mostly
> about glBitmap/glDrawPixels variants for fragment shaders.
>
> I am a bit curious how that can be handled *without* creating variants
> in the st.
>
> Also a bit curious how that is handled in i965 (ie. if I wrote nir
> lowering passes for these, would they be useful for i965?)

TBH, I don't know.  I think we do some meta thing for DrawPixels.  For
glBitmap, I think we bring out ye olde blitter and use some of those crazy
2D drawing features we still have in HW.

> BR,
> -R
>
> >>
> >> It seems like in the end, since memctx isn't really buying us that
> >> much, that refcnt'ing is the more useful thing.
> >>
> >> BR,
> >> -R
> >>
> >>> --Jason
> >>>
> >>>> ---
> >>>>  src/mesa/drivers/dri/i965/brw_compiler.h          | 22
> >>>> ++++++++++++++++++++++
> >>>>  src/mesa/drivers/dri/i965/brw_fs.cpp              |  6 ++++--
> >>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp            |  3 ++-
> >>>>  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp |  3 ++-
> >>>>  4 files changed, 30 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h
> >>>> b/src/mesa/drivers/dri/i965/brw_compiler.h
> >>>> index c9e0317..eb1087a 100644
> >>>> --- a/src/mesa/drivers/dri/i965/brw_compiler.h
> >>>> +++ b/src/mesa/drivers/dri/i965/brw_compiler.h
> >>>> @@ -26,6 +26,7 @@
> >>>>  #include <stdio.h>
> >>>>  #include "brw_device_info.h"
> >>>>  #include "main/mtypes.h"
> >>>> +#include "nir.h"
> >>>>
> >>>>  #ifdef __cplusplus
> >>>>  extern "C" {
> >>>> @@ -651,6 +652,27 @@ struct brw_gs_prog_data
> >>>>
> >>>>  /** @} */
> >>>>
> >>>> +#ifdef __cplusplus
> >>>> +/* A common pattern is for brw_compile_xyz to create a clone of the
> >>>> original
> >>>> + * shader, modify it (various lowering passes, etc), and then throw
it
> >>>> away.
> >>>> + * The scoped_clone helper class is just a way to use RAII to clean
up
> >>>> the
> >>>> + * shader when it goes out of scope.
> >>>> + */
> >>>> +class scoped_clone
> >>>> +{
> >>>> +public:
> >>>> +   nir_shader *shader;
> >>>> +   scoped_clone(const nir_shader *s)
> >>>> +   {
> >>>> +      shader = nir_shader_clone(NULL, s);
> >>>> +   }
> >>>> +   ~scoped_clone()
> >>>> +   {
> >>>> +      ralloc_free(shader);
> >>>> +   }
> >>>> +};
> >>>> +#endif
> >>>> +
> >>>>  /**
> >>>>   * Compile a vertex shader.
> >>>>   *
> >>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> >>>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> >>>> index c833ef0..44ea156 100644
> >>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> >>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> >>>> @@ -5490,7 +5490,8 @@ brw_compile_fs(const struct brw_compiler
*compiler,
> >>>> void *log_data,
> >>>>                 unsigned *final_assembly_size,
> >>>>                 char **error_str)
> >>>>  {
> >>>> -   nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
> >>>> +   scoped_clone sc(src_shader);
> >>>> +   nir_shader *shader = sc.shader;
> >>>>     shader = brw_nir_apply_sampler_key(shader, compiler->devinfo,
> >>>> &key->tex,
> >>>>                                        true);
> >>>>     shader = brw_postprocess_nir(shader, compiler->devinfo, true);
> >>>> @@ -5616,7 +5617,8 @@ brw_compile_cs(const struct brw_compiler
*compiler,
> >>>> void *log_data,
> >>>>                 unsigned *final_assembly_size,
> >>>>                 char **error_str)
> >>>>  {
> >>>> -   nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
> >>>> +   scoped_clone sc(src_shader);
> >>>> +   nir_shader *shader = sc.shader;
> >>>>     shader = brw_nir_apply_sampler_key(shader, compiler->devinfo,
> >>>> &key->tex,
> >>>>                                        true);
> >>>>     shader = brw_postprocess_nir(shader, compiler->devinfo, true);
> >>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> >>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> >>>> index a697bdf..a89e884 100644
> >>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> >>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> >>>> @@ -1957,7 +1957,8 @@ brw_compile_vs(const struct brw_compiler
*compiler,
> >>>> void *log_data,
> >>>>                 unsigned *final_assembly_size,
> >>>>                 char **error_str)
> >>>>  {
> >>>> -   nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
> >>>> +   scoped_clone sc(src_shader);
> >>>> +   nir_shader *shader = sc.shader;
> >>>>     shader = brw_nir_apply_sampler_key(shader, compiler->devinfo,
> >>>> &key->tex,
> >>>>
> >>>> compiler->scalar_stage[MESA_SHADER_VERTEX]);
> >>>>     shader = brw_postprocess_nir(shader, compiler->devinfo,
> >>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> >>>> b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> >>>> index b13d36e..9ed7e0d 100644
> >>>> --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> >>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> >>>> @@ -617,7 +617,8 @@ brw_compile_gs(const struct brw_compiler
*compiler,
> >>>> void *log_data,
> >>>>     memset(&c, 0, sizeof(c));
> >>>>     c.key = *key;
> >>>>
> >>>> -   nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
> >>>> +   scoped_clone sc(src_shader);
> >>>> +   nir_shader *shader = sc.shader;
> >>>>     shader = brw_nir_apply_sampler_key(shader, compiler->devinfo,
> >>>> &key->tex,
> >>>>
> >>>> compiler->scalar_stage[MESA_SHADER_GEOMETRY]);
> >>>>     shader = brw_postprocess_nir(shader, compiler->devinfo,
> >>>> --
> >>>> 2.5.0
> >>>>
> >> _______________________________________________
> >> mesa-dev mailing list
> >> mesa-dev at lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20151220/98c088f8/attachment-0001.html>


More information about the mesa-dev mailing list