[Mesa-dev] [PATCH 00/11] nir: Add a pass management framework

Sat Oct 31 14:56:02 PDT 2015

On Sat, Oct 31, 2015 at 2:39 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Sat, Oct 31, 2015 at 3:29 PM, Rob Clark <robdclark at gmail.com> wrote:
>> On Sat, Oct 31, 2015 at 2:26 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>> On Sat, Oct 31, 2015 at 1:38 PM, Rob Clark <robdclark at gmail.com> wrote:
>>>> On Sat, Oct 31, 2015 at 11:31 AM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>>>> On Sat, Oct 31, 2015 at 10:55 AM, Rob Clark <robdclark at gmail.com> wrote:
>>>>>> On Fri, Oct 30, 2015 at 6:42 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>>>>> On Fri, Oct 30, 2015 at 2:12 PM, Ian Romanick <idr at freedesktop.org> wrote:
>>>>>>>> On 10/28/2015 10:01 PM, Jason Ekstrand wrote:
>>>>>>>>>
>>>>>>>>> On Oct 28, 2015 9:12 PM, "Kenneth Graunke" <kenneth at whitecape.org
>>>>>>>>> <mailto:kenneth at whitecape.org>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Wednesday, October 28, 2015 02:58:07 PM Kristian Høgsberg wrote:
>>>>>>>>>> > On Wed, Oct 28, 2015 at 2:34 PM, Jason Ekstrand
>>>>>>>>> <jason at jlekstrand.net <mailto:jason at jlekstrand.net>> wrote:
>>>>>>>>>> > > On Wed, Oct 28, 2015 at 2:32 PM, Jason Ekstrand
>>>>>>>>> <jason at jlekstrand.net <mailto:jason at jlekstrand.net>> wrote:
>>>>>>>>>> > >> This series adds a nir_pass datastructure and some helpers for
>>>>>>>>> managing
>>>>>>>>>> > >> optimization and lowering passes.  I've been meaning to get
>>>>>>>>> around to this
>>>>>>>>>> > >> for some time.  There are a couple of primary benifits to this:
>>>>>>>>>> > >>
>>>>>>>>>> > >> First, this gives us a central place to put things such as
>>>>>>>>> validating the
>>>>>>>>>> > >> shader, printing it if it changes, etc.  Right now, the i965
>>>>>>>>> backend calls
>>>>>>>>>> > >> nir_validate_shader after each pass.  We would also like to add
>>>>>>>>> something
>>>>>>>>>> > >> like we have in the i965 backend where it can be set to dump the
>>>>>>>>> IR to a
>>>>>>>>>> > >> file after every pass that changess it.
>>>>>>>>>> > >>
>>>>>>>>>> > >> Mor importantly, though, it moves metadata out of the passes them
>>>>>>>>> selves
>>>>>>>>>> > >> and into the runner.  In the process of putting this series
>>>>>>>>> together, I
>>>>>>>>>> > >> found at least 3 or 4 optimization passes that don't properly
>>>>>>>>> invalidate
>>>>>>>>>> > >> metadata.  By putting a metadata_preserved field in nir_pass and
>>>>>>>>> handling
>>>>>>>>>> > >> metadata in the pass runner, we make it much less likely that a
>>>>>>>>> pass will
>>>>>>>>>> > >> get this wrong.  LLVM has a similar optimization pass
>>>>>>>>> architecture for
>>>>>>>>>> > >> precicely this reason.
>>>>>>>>>> > >>
>>>>>>>>>> > >> As a nice little side-benifit, we no longer have to iterate over
>>>>>>>>> all of the
>>>>>>>>>> > >> overloads with non-NULL impl pointers in each pass.
>>>>>>>>>> > >
>>>>>>>>>> > > Once again, git-send-email failed to send the last patch for whatever
>>>>>>>>>> > > reason.  The entire series can be found here:
>>>>>>>>>> > >
>>>>>>>>>> > > http://cgit.freedesktop.org/~jekstrand/mesa/log/?h=wip/nir-pass
>>>>>>>>>> >
>>>>>>>>>> > Nice. Series,
>>>>>>>>>> >
>>>>>>>>>> > Reviewed-by: Kristian Høgsberg <krh at bitplanet.net
>>>>>>>>> <mailto:krh at bitplanet.net>>
>>>>>>>>>>
>>>>>>>>>> I plan to review this as well, so please hold off on pushing it for a
>>>>>>>>>> little while.  Thanks!
>>>>>>>>>
>>>>>>>>> By all means, go ahead.  It'd also be nice, while you're at it to weigh
>>>>>>>>> in on how to handle passing arguments to passes.  There are a number of
>>>>>>>>> ideas thrown back-and-forth between Connor and myself on how to do it
>>>>>>>>
>>>>>>>> Is it even necessary to sort that out now?  Are there passes that
>>>>>>>> haven't been ported to this infrastructure that need any of the extra
>>>>>>>> features that you and Connor were discussing?  I will give keithp credit
>>>>>>>> for teaching me not to engineer in features that you don't need yet.
>>>>>>>> When you do need them, the thing you spent a bunch of time putting in
>>>>>>>> probably won't be what you need.
>>>>>>>
>>>>>>> Strictly speaking, no.  There are passes that haven't been converted
>>>>>>> that will need it.  The issue is how do you pass arguments into
>>>>>>> passes.  Most optimization passes don't care but there are a few that
>>>>>>> will have non-trivial arguments.
>>>>>>
>>>>>> random drive-by comment:  in addition to making it easier to have pass
>>>>>> specific args (which *could* I suppose be moved into
>>>>>> nir_shader_compiler_options at the expense of making that not a const
>>>>>> thing anymore, since some of the options depend on shader variant
>>>>>> key), having old-fashioned code to call opt passes (instead of a table
>>>>>> of passes) makes it easier to insert nir_print_shader() calls in
>>>>>> strategic spots while debugging ;-)
>>>>>
>>>>> But with this, you can implement something like INTEL_DEBUG=opt where
>>>>> you dump a file in between each pass and then you can diff them, which
>>>>> is even better :).
>>>>
>>>> well, sometimes that is a bit more data than I want to collect ;-)

I can't make any guarantees, but I think you'd find that the flood of
data isn't really a problem.  Also, the ability to dump stuff without
modifying the code is more advantageous than being selective.  If
being able to put prints in places is something you care about, we can
always make a nir_print pass that you can put anywhere in the list.
:-)

>>> Well, you could always hack up the dumping routine to only dump on a
>>> certain pass or shader name... but since the pass name and shader name
>>> will be in the filename, you can also just use find/grep :) In my
>>> personal experience fixing bugs in the i965 backend, being able to
>>> dump out the result of all the optimization passes that made progress
>>> was very useful. To get unique filenames, i965 uses a macro and a
>>> counter variable, but it's cleaner if you have one place running all
>>> the optimization passes since that place can update its counter
>>> without having a macro that mucks with the variable behind the scenes.
>>> Also, i965's implementation relies heavily on GCC statement
>>> expressions, which we can't use in core NIR.
>>>
>>>>
>>>> Anyways, my natural reaction against 'meta-programming' type stuff
>>>> (like, table of passes, vs calls to those passes) might be influenced
>>>> by seeing some more horrible applications of the technique.  But I
>>>> wonder if the same thing could be done w/ helper fxn instead.  At
>>>> least in the current state I think it could, I mean:
>>>>
>>>>   // in nir_lower_foo.c:
>>>>   ...
>>>>   static nir_pass lower_foo = { ... };
>>>>
>>>>   nir_lower_foo(shader)
>>>>   {
>>>>      nir_shader_run_pass(shader, &lower_foo);
>>>>   }
>>>>
>>>> That way, use the nir-pass helper stuff if it helps, or not.  Feels
>>>> less like a mid-layer that way.  I think you could even implement your
>>>> idea about optional cleanup-passes by adding 'struct nir_pass
>>>> **cleanups' to nir_pass..
>>>
>>> That's exactly what Jason's series already does :) There's nothing
>>> preventing you from running individual optimization passes, or even
>>> bypassing the whole thing entirely. But...

Yup, that's how my series starts off.  We could keep the functions if
we really want as long as they're just a static inline call wrapper
around nir_shader_run_pass.

>>>>
>>>> From the consumer point of view, it looks like:
>>>>
>>>>     ...
>>>>     nir_lower_foo(shader);
>>>>     ...
>>>>
>>>> instead of:
>>>>
>>>>    ...
>>>>    passes[num_passes++] = &lower_foo;
>>>>    ...
>>>>
>>>> with the advantage that you can easily use
>>>> loops/if-else/debug-prints/collect-stats/etc..
>>>
>>> ...most of these things can be done better with something like
>>> nir_shader_optimize...
>>
>> umm.. well, if "better" means "less flexible".. since I assume you
>> weren't planning to make nir_shader_optimize() turing complete..
>
> "Better" meaning "can be done in one place without driver-specific
> macros"... and I highly doubt that you need full Turing-completeness
> in how you order your optimization passes :) Of course, you can run
> some lowering passes individually by yourself based on whatever
> conditions you want, but this is just for your main optimization loop.

I did briefly think about adding a nir_opt_loop pass that takes a list
of passes as its argument and loops over them...

>>
>>>>
>>>> Then again, I guess I get more or less the same thing if I just stick
>>>> w/ calling nir_shader_run_pass() and ignore nir_shader_optimize()..
>>>
>>> ...so it would be very silly to do this. Having a list of cleanups
>>> doesn't do you any good unless you can dynamically choose what to run
>>> (rather than always just running it in a loop), and debug printing and
>>> collecting statistics can be done better when you have one place that
>>> knows about all the optimizations that are going to be run.
>>
>> well unless you are thinking of deferring cleanups until end of
>> optimize loop??  Otherwise there is nothing stopping
>> nir_shader_run_pass() from conditionally running the cleanup passes.
>
> No, neither of those things are what I had in mind. I think any
> "cleanup" list works a lot better as a hint to the pass scheduler. If
> you have loops among the cleanup lists (this produces garbage which
> this cleans up, which may potentially produce more garbage...) then
> you're going to recursively call them in a loop, at which point you're
> basically already doing what the hypothetical "smart" scheduler would
> do. Also, what if you want to run a pass in isolation? What if you
> only want to run some subset of passes which includes part of the
> cleanups on the list but not others? Doesn't this give you even less
> control? Also, I want to be able to list certain passes as
> "prerequisites," in addition to the other way around, or else a few
> passes (copy prop, DCE) would have a *lot* of unrelated cleanup passes
> that aren't really "cleanups" per-se. We can't do that by running the
> pass individually, since then we'd waste time re-running them
> beforehand when a proper scheduler would have better heuristics for
> running passes as soon as possible after their prerequisites are done.

I do think that's getting a bit ahead of things but, yes, it would be
cool to someday be able to specify prerequisites/cleanups and have the
pass manager automatically handle making sure they get run at the
right time.  This is the algorithm I've been thinking of :

 1) When you run a pass, stash all of its cleanups in a cleanup set
 2) When you run a pass, look in the cleanup set for each of its
prerequisites; if you find one, run it and remove from the cleanup
set.
 3) Run all the cleanups in the set at the end of the loop.

Eventually, something like that would be cool.  Right now, I'll settle
with a decent mechanism for passing arguments.  Maybe just pass them
in to nir_shader_run_pass for now.