[Mesa-dev] [RFC] i965: alternative to memctx for cleaning up nir variants
Rob Clark
robdclark at gmail.com
Sun Dec 20 19:43:12 PST 2015
On Sun, Dec 20, 2015 at 10:29 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Sun, Dec 20, 2015 at 10:04 PM, Rob Clark <robdclark at gmail.com> wrote:
>> On Sun, Dec 20, 2015 at 9:12 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>
>>> On Dec 19, 2015 5:55 PM, "Rob Clark" <robdclark at gmail.com> wrote:
>>>>
>>>> From: Rob Clark <robclark at freedesktop.org>
>>>>
>>>> Jason,
>>>>
>>>> How much do you hate this idea? Seems like an easy alternative to
>>>> using ralloc ctx's to clean up nir variants/clones, which would let
>>>> us drop the parent memctx for nir_shader_create()/clone(), making
>>>> it easier to introduce reference counting.
>>>
>>> I think "hate" is a but strong. I don't like it but it works. If we really
>>> want nir_shader refcounted, we'll have to do something.
>>
>> I suppose the alternate idea of moving the nir_shader_clone() out of
>> brw_compile_xyz(), and always passing in the clone would be a cleaner
>> way. It looks like each of the brw_compile_xyz() has exactly one
>> call-site, so doing the nir_shader_clone() inside doesn't really buy
>> anything.
>>
>>> About refcounting... The more I think about it the more I'm not convinced
>>> it's useful. As it stands, we have no use for it an I'm not convinced you
>>> do either. We'll see if I can convince you. :-)
>>>
>>> In the history of i965 using NIR, we've had about three different ways of
>>> doing things:
>>>
>>> 1) GLSL is the gold copy and we run glsl_to_nir for every shader/variant
>>> compile. This is what we did when we first stated using NIR because it was
>>> easy and didn't involve reworking any plumbing.
>>>
>>> 2) Lowered NIR is the gold copy; variants are done entirely in the back-end
>>> IR. This is what we did up until about a month ago. Because variants are
>>> done in the back-end, we can run gksl_to_nir and do all of our optimizing
>>> and lowering at link time. Going from NIR to the final shader binary is
>>> then a read-only operation as far as NIR is concerned.
>>>
>>> 3) Optimized but not lowered NIR is the gold copy; variants are sometimes
>>> done in NIR. This is the scheme we use now. We call glsl_to_nir and do
>>> some of the optimization and lowering at link time but leave it in SSA form.
>>> When we go to compile the final shader, we make a copy, apply variants, do
>>> the final lowering and then go into the back-end IR.
>>>
>>> In each of these cases, we know exactly where we need to make a copy without
>>> the help of reference counting. In the first, we get a fresh copy each time
>>> so we are free to destroy the copy. In the second, we never have to modify
>>> the NIR so no copy. In the third scheme, we always have to make a copy
>>> because, even if variants are a no-op, we still have to go out of SSA form
>>> and do final lowering. You could say that we could avoid making that copy.
>>> However, the work to determine when we don't need variants and can do all
>>> our lowering up-front is far more than the work saved by reference counting.
>>>
>>> How about gallium? Here's how I imagine it would work (please correct me of
>>> I'm wrong):
>>>
>>> 1) In the TGSI case, tgsi_to_nir gets called for each compile so you get a
>>> fresh mutable shader each time. In this case, you are free to do whatever
>>> you want with the shader without making a copy.
>>>
>>> 2) In the GLSL case, you run glsl_to_nir and do some basic optimizations at
>>> link time and hold onto the NIR shader. (Hold a reference of you'd like.)
>>> When you go to compile it in the back-end, it needs to do it's own lowering
>>> so it takes a reference and ends up making a copy.
>>>
>>> If this description is anywhere close to correct, then I don't think you
>>> really need it either. Determining whether or not you need to copy is
>>> simply "if (comes_from_tgsi)”. Maybe there's something subtle about the
>>> gallium layer that I don't know that makes refcounting the best solution.
>>> Please enlighten me of there is.
>>
>> This issue is that we *potentially* have both the state tracker and
>> the driver both doing some of there own variant management. (Which
>> tbh, isn't awesome, it would have been nice if someone realized
>> earlier on that nearly every driver is going to have to do some sort
>> of variant mgmt and figured out a way just to push it all down to the
>> driver.. but I can't see a good way to get there from here.)
>>
>> With TGSI as the IR, driver just unconditionally does
>> tgsi_dup_tokens().. because of the whole thing where st does variants
>> in some cases, things are defined that driver doesn't own the copy of
>> the TGSI IR passed in after the fxn call to driver returns.
>>
>> With NIR I was hoping to fix this, esp. since nir_shader_clone() is
>> more heavyweight than tgsi_dup_tokens() (memcpy()).
>>
>> Refcnt'ing is a nice solution so that we can pass the driver a
>> reference that it owns. In cases where state tracker isn't doing
>> variant mgmt, we pass it the one-and-only ref (avoiding clone).
>>
>> I'd suggested that in cases where st does variant mgmt, that st should
>> do the clone/dup. But that was shot down:
>>
>> http://lists.freedesktop.org/archives/mesa-dev/2015-October/097748.html
>
> Ugh... I didn't read this at the time, but I don't like Marek's
> response. My understanding of the situation, based on this thread, is
> that there are some cases where the st knows that there's only going
> to be one variant and can throw away the (NIR or TGSI) shader after it
> hands it to the driver, while at other times it has to hold onto all
> the variants and only give the driver a read-only copy (or duplicate
> the shader). In that case, forcing the IR to implement reference
> counting seems like much more "cruft" than duplicating the shader in
> the st. What's the difference between duplicating the shader and
> incrementing the reference count before passing it to the driver? It
> seems like it's almost exactly the same amount of code, and reference
> counting seems like overkill for a situation where there are only ever
> going to be 1 or 2 references to the shader.
>
> I remember there was some cap which would enable the state tracker to
> stop making variants, in which case this whole thing is moot, but
> until then it doesn't seem necessary to force reference counting into
> NIR.
>
So, I'm actually just digging thru this now, but I think it is mostly
about glBitmap/glDrawPixels variants for fragment shaders.
I am a bit curious how that can be handled *without* creating variants
in the st.
Also a bit curious how that is handled in i965 (ie. if I wrote nir
lowering passes for these, would they be useful for i965?)
BR,
-R
>>
>> It seems like in the end, since memctx isn't really buying us that
>> much, that refcnt'ing is the more useful thing.
>>
>> BR,
>> -R
>>
>>> --Jason
>>>
>>>> ---
>>>> src/mesa/drivers/dri/i965/brw_compiler.h | 22
>>>> ++++++++++++++++++++++
>>>> src/mesa/drivers/dri/i965/brw_fs.cpp | 6 ++++--
>>>> src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 ++-
>>>> src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 3 ++-
>>>> 4 files changed, 30 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h
>>>> b/src/mesa/drivers/dri/i965/brw_compiler.h
>>>> index c9e0317..eb1087a 100644
>>>> --- a/src/mesa/drivers/dri/i965/brw_compiler.h
>>>> +++ b/src/mesa/drivers/dri/i965/brw_compiler.h
>>>> @@ -26,6 +26,7 @@
>>>> #include <stdio.h>
>>>> #include "brw_device_info.h"
>>>> #include "main/mtypes.h"
>>>> +#include "nir.h"
>>>>
>>>> #ifdef __cplusplus
>>>> extern "C" {
>>>> @@ -651,6 +652,27 @@ struct brw_gs_prog_data
>>>>
>>>> /** @} */
>>>>
>>>> +#ifdef __cplusplus
>>>> +/* A common pattern is for brw_compile_xyz to create a clone of the
>>>> original
>>>> + * shader, modify it (various lowering passes, etc), and then throw it
>>>> away.
>>>> + * The scoped_clone helper class is just a way to use RAII to clean up
>>>> the
>>>> + * shader when it goes out of scope.
>>>> + */
>>>> +class scoped_clone
>>>> +{
>>>> +public:
>>>> + nir_shader *shader;
>>>> + scoped_clone(const nir_shader *s)
>>>> + {
>>>> + shader = nir_shader_clone(NULL, s);
>>>> + }
>>>> + ~scoped_clone()
>>>> + {
>>>> + ralloc_free(shader);
>>>> + }
>>>> +};
>>>> +#endif
>>>> +
>>>> /**
>>>> * Compile a vertex shader.
>>>> *
>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>>> index c833ef0..44ea156 100644
>>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>>> @@ -5490,7 +5490,8 @@ brw_compile_fs(const struct brw_compiler *compiler,
>>>> void *log_data,
>>>> unsigned *final_assembly_size,
>>>> char **error_str)
>>>> {
>>>> - nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
>>>> + scoped_clone sc(src_shader);
>>>> + nir_shader *shader = sc.shader;
>>>> shader = brw_nir_apply_sampler_key(shader, compiler->devinfo,
>>>> &key->tex,
>>>> true);
>>>> shader = brw_postprocess_nir(shader, compiler->devinfo, true);
>>>> @@ -5616,7 +5617,8 @@ brw_compile_cs(const struct brw_compiler *compiler,
>>>> void *log_data,
>>>> unsigned *final_assembly_size,
>>>> char **error_str)
>>>> {
>>>> - nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
>>>> + scoped_clone sc(src_shader);
>>>> + nir_shader *shader = sc.shader;
>>>> shader = brw_nir_apply_sampler_key(shader, compiler->devinfo,
>>>> &key->tex,
>>>> true);
>>>> shader = brw_postprocess_nir(shader, compiler->devinfo, true);
>>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>>> index a697bdf..a89e884 100644
>>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>>> @@ -1957,7 +1957,8 @@ brw_compile_vs(const struct brw_compiler *compiler,
>>>> void *log_data,
>>>> unsigned *final_assembly_size,
>>>> char **error_str)
>>>> {
>>>> - nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
>>>> + scoped_clone sc(src_shader);
>>>> + nir_shader *shader = sc.shader;
>>>> shader = brw_nir_apply_sampler_key(shader, compiler->devinfo,
>>>> &key->tex,
>>>>
>>>> compiler->scalar_stage[MESA_SHADER_VERTEX]);
>>>> shader = brw_postprocess_nir(shader, compiler->devinfo,
>>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
>>>> b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
>>>> index b13d36e..9ed7e0d 100644
>>>> --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
>>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
>>>> @@ -617,7 +617,8 @@ brw_compile_gs(const struct brw_compiler *compiler,
>>>> void *log_data,
>>>> memset(&c, 0, sizeof(c));
>>>> c.key = *key;
>>>>
>>>> - nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
>>>> + scoped_clone sc(src_shader);
>>>> + nir_shader *shader = sc.shader;
>>>> shader = brw_nir_apply_sampler_key(shader, compiler->devinfo,
>>>> &key->tex,
>>>>
>>>> compiler->scalar_stage[MESA_SHADER_GEOMETRY]);
>>>> shader = brw_postprocess_nir(shader, compiler->devinfo,
>>>> --
>>>> 2.5.0
>>>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list