[Mesa-dev] [RFC] i965: alternative to memctx for cleaning up nir variants

Sun Dec 20 19:43:12 PST 2015

On Sun, Dec 20, 2015 at 10:29 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Sun, Dec 20, 2015 at 10:04 PM, Rob Clark <robdclark at gmail.com> wrote:
>> On Sun, Dec 20, 2015 at 9:12 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>
>>> On Dec 19, 2015 5:55 PM, "Rob Clark" <robdclark at gmail.com> wrote:
>>>>
>>>> From: Rob Clark <robclark at freedesktop.org>
>>>>
>>>> Jason,
>>>>
>>>> How much do you hate this idea?  Seems like an easy alternative to
>>>> using ralloc ctx's to clean up nir variants/clones, which would let
>>>> us drop the parent memctx for nir_shader_create()/clone(), making
>>>> it easier to introduce reference counting.
>>>
>>> I think "hate" is a but strong. I don't like it but it works.  If we really
>>> want nir_shader refcounted, we'll have to do something.
>>
>> I suppose the alternate idea of moving the nir_shader_clone() out of
>> brw_compile_xyz(), and always passing in the clone would be a cleaner
>> way.  It looks like each of the brw_compile_xyz() has exactly one
>> call-site, so doing the nir_shader_clone() inside doesn't really buy
>> anything.
>>
>>> About refcounting... The more I think about it the more I'm not convinced
>>> it's useful.  As it stands, we have no use for it an I'm not convinced you
>>> do either.  We'll see if I can convince you. :-)
>>>
>>> In the history of i965 using NIR, we've had about three different ways of
>>> doing things:
>>>
>>> 1) GLSL is the gold copy and we run glsl_to_nir for every shader/variant
>>> compile.  This is what we did when we first stated using NIR because it was
>>> easy and didn't involve reworking any plumbing.
>>>
>>> 2) Lowered NIR is the gold copy; variants are done entirely in the back-end
>>> IR.  This is what we did up until about a month ago.  Because variants are
>>> done in the back-end, we can run gksl_to_nir and do all of our optimizing
>>> and lowering at link time.  Going from NIR to the final shader binary is
>>> then a read-only operation as far as NIR is concerned.
>>>
>>> 3) Optimized but not lowered NIR is the gold copy; variants are sometimes
>>> done in NIR.  This is the scheme we use now.  We call glsl_to_nir and do
>>> some of the optimization and lowering at link time but leave it in SSA form.
>>> When we go to compile the final shader, we make a copy, apply variants, do
>>> the final lowering and then go into the back-end IR.
>>>
>>> In each of these cases, we know exactly where we need to make a copy without
>>> the help of reference counting.  In the first, we get a fresh copy each time
>>> so we are free to destroy the copy.  In the second, we never have to modify
>>> the NIR so no copy.  In the third scheme, we always have to make a copy
>>> because, even if variants are a no-op, we still have to go out of SSA form
>>> and do final lowering.  You could say that we could avoid making that copy.
>>> However, the work to determine when we don't need variants and can do all
>>> our lowering up-front is far more than the work saved by reference counting.
>>>
>>> How about gallium?  Here's how I imagine it would work (please correct me of
>>> I'm wrong):
>>>
>>> 1) In the TGSI case, tgsi_to_nir gets called for each compile so you get a
>>> fresh mutable shader each time.  In this case, you are free to do whatever
>>> you want with the shader without making a copy.
>>>
>>> 2) In the GLSL case, you run glsl_to_nir and do some basic optimizations at
>>> link time and hold onto the NIR shader. (Hold a reference of you'd like.)
>>> When you go to compile it in the back-end, it needs to do it's own lowering
>>> so it takes a reference and ends up making a copy.
>>>
>>> If this description is anywhere close to correct, then I don't think you
>>> really need it either.  Determining whether or not you need to copy is
>>> simply "if (comes_from_tgsi)”.  Maybe there's something subtle about the
>>> gallium layer that I don't know that makes refcounting the best solution.
>>> Please enlighten me of there is.
>>
>> This issue is that we *potentially* have both the state tracker and
>> the driver both doing some of there own variant management.  (Which
>> tbh, isn't awesome, it would have been nice if someone realized
>> earlier on that nearly every driver is going to have to do some sort
>> of variant mgmt and figured out a way just to push it all down to the
>> driver.. but I can't see a good way to get there from here.)
>>
>> With TGSI as the IR, driver just unconditionally does
>> tgsi_dup_tokens().. because of the whole thing where st does variants
>> in some cases, things are defined that driver doesn't own the copy of
>> the TGSI IR passed in after the fxn call to driver returns.
>>
>> With NIR I was hoping to fix this, esp. since nir_shader_clone() is
>> more heavyweight than tgsi_dup_tokens() (memcpy()).
>>
>> Refcnt'ing is a nice solution so that we can pass the driver a
>> reference that it owns.  In cases where state tracker isn't doing
>> variant mgmt, we pass it the one-and-only ref (avoiding clone).
>>
>> I'd suggested that in cases where st does variant mgmt, that st should
>> do the clone/dup.  But that was shot down:
>>
>> http://lists.freedesktop.org/archives/mesa-dev/2015-October/097748.html
>
> Ugh... I didn't read this at the time, but I don't like Marek's
> response. My understanding of the situation, based on this thread, is
> that there are some cases where the st knows that there's only going
> to be one variant and can throw away the (NIR or TGSI) shader after it
> hands it to the driver, while at other times it has to hold onto all
> the variants and only give the driver a read-only copy (or duplicate
> the shader). In that case, forcing the IR to implement reference
> counting seems like much more "cruft" than duplicating the shader in
> the st. What's the difference between duplicating the shader and
> incrementing the reference count before passing it to the driver? It
> seems like it's almost exactly the same amount of code, and reference
> counting seems like overkill for a situation where there are only ever
> going to be 1 or 2 references to the shader.
>
> I remember there was some cap which would enable the state tracker to
> stop making variants, in which case this whole thing is moot, but
> until then it doesn't seem necessary to force reference counting into
> NIR.
>

So, I'm actually just digging thru this now, but I think it is mostly
about glBitmap/glDrawPixels variants for fragment shaders.

I am a bit curious how that can be handled *without* creating variants
in the st.

Also a bit curious how that is handled in i965 (ie. if I wrote nir
lowering passes for these, would they be useful for i965?)

BR,
-R

>>
>> It seems like in the end, since memctx isn't really buying us that
>> much, that refcnt'ing is the more useful thing.
>>
>> BR,
>> -R
>>
>>> --Jason
>>>
>>>> ---
>>>>  src/mesa/drivers/dri/i965/brw_compiler.h          | 22
>>>> ++++++++++++++++++++++
>>>>  src/mesa/drivers/dri/i965/brw_fs.cpp              |  6 ++++--
>>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp            |  3 ++-
>>>>  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp |  3 ++-
>>>>  4 files changed, 30 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h
>>>> b/src/mesa/drivers/dri/i965/brw_compiler.h
>>>> index c9e0317..eb1087a 100644
>>>> --- a/src/mesa/drivers/dri/i965/brw_compiler.h
>>>> +++ b/src/mesa/drivers/dri/i965/brw_compiler.h
>>>> @@ -26,6 +26,7 @@
>>>>  #include <stdio.h>
>>>>  #include "brw_device_info.h"
>>>>  #include "main/mtypes.h"
>>>> +#include "nir.h"
>>>>
>>>>  #ifdef __cplusplus
>>>>  extern "C" {
>>>> @@ -651,6 +652,27 @@ struct brw_gs_prog_data
>>>>
>>>>  /** @} */
>>>>
>>>> +#ifdef __cplusplus
>>>> +/* A common pattern is for brw_compile_xyz to create a clone of the
>>>> original
>>>> + * shader, modify it (various lowering passes, etc), and then throw it
>>>> away.
>>>> + * The scoped_clone helper class is just a way to use RAII to clean up
>>>> the
>>>> + * shader when it goes out of scope.
>>>> + */
>>>> +class scoped_clone
>>>> +{
>>>> +public:
>>>> +   nir_shader *shader;
>>>> +   scoped_clone(const nir_shader *s)
>>>> +   {
>>>> +      shader = nir_shader_clone(NULL, s);
>>>> +   }
>>>> +   ~scoped_clone()
>>>> +   {
>>>> +      ralloc_free(shader);
>>>> +   }
>>>> +};
>>>> +#endif
>>>> +
>>>>  /**
>>>>   * Compile a vertex shader.
>>>>   *
>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>>> index c833ef0..44ea156 100644
>>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>>> @@ -5490,7 +5490,8 @@ brw_compile_fs(const struct brw_compiler *compiler,
>>>> void *log_data,
>>>>                 unsigned *final_assembly_size,
>>>>                 char **error_str)
>>>>  {
>>>> -   nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
>>>> +   scoped_clone sc(src_shader);
>>>> +   nir_shader *shader = sc.shader;
>>>>     shader = brw_nir_apply_sampler_key(shader, compiler->devinfo,
>>>> &key->tex,
>>>>                                        true);
>>>>     shader = brw_postprocess_nir(shader, compiler->devinfo, true);
>>>> @@ -5616,7 +5617,8 @@ brw_compile_cs(const struct brw_compiler *compiler,
>>>> void *log_data,
>>>>                 unsigned *final_assembly_size,
>>>>                 char **error_str)
>>>>  {
>>>> -   nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
>>>> +   scoped_clone sc(src_shader);
>>>> +   nir_shader *shader = sc.shader;
>>>>     shader = brw_nir_apply_sampler_key(shader, compiler->devinfo,
>>>> &key->tex,
>>>>                                        true);
>>>>     shader = brw_postprocess_nir(shader, compiler->devinfo, true);
>>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>>> index a697bdf..a89e884 100644
>>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>>> @@ -1957,7 +1957,8 @@ brw_compile_vs(const struct brw_compiler *compiler,
>>>> void *log_data,
>>>>                 unsigned *final_assembly_size,
>>>>                 char **error_str)
>>>>  {
>>>> -   nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
>>>> +   scoped_clone sc(src_shader);
>>>> +   nir_shader *shader = sc.shader;
>>>>     shader = brw_nir_apply_sampler_key(shader, compiler->devinfo,
>>>> &key->tex,
>>>>
>>>> compiler->scalar_stage[MESA_SHADER_VERTEX]);
>>>>     shader = brw_postprocess_nir(shader, compiler->devinfo,
>>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
>>>> b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
>>>> index b13d36e..9ed7e0d 100644
>>>> --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
>>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
>>>> @@ -617,7 +617,8 @@ brw_compile_gs(const struct brw_compiler *compiler,
>>>> void *log_data,
>>>>     memset(&c, 0, sizeof(c));
>>>>     c.key = *key;
>>>>
>>>> -   nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
>>>> +   scoped_clone sc(src_shader);
>>>> +   nir_shader *shader = sc.shader;
>>>>     shader = brw_nir_apply_sampler_key(shader, compiler->devinfo,
>>>> &key->tex,
>>>>
>>>> compiler->scalar_stage[MESA_SHADER_GEOMETRY]);
>>>>     shader = brw_postprocess_nir(shader, compiler->devinfo,
>>>> --
>>>> 2.5.0
>>>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev