[Mesa-dev] [RFC PATCH 00/16] A new IR for Mesa

Mon Aug 18 13:22:05 PDT 2014

On Mon, Aug 18, 2014 at 12:20 PM, Thomas Helland
<thomashelland90 at gmail.com> wrote:
> Hi Connor!
>
> I've been scrolling through your github-repo a bit the latest weeks,
> and I have to say, this seems quite promising.
>
> I've got some questions that I haven't really been able to answer
> myself with the quick glimpse I've had over the codebase:
>
> Since we're in large making a mathematical graph rewriting
> simplifier-thingy just as much as a compiler, does the IR as of
> now have an easy way of storing upper and lower bounds of variables?

NIR as it stands doesn't have a way of storing upper and lower bounds
of registers/SSA values (variables aren't used for computation in
NIR), but it would be easy to do for an analysis pass - SSA values are
indexed, so just put them in an array.

>
> Also, does it have an easy way to get something like the
> hierarchical visitor we have in GLSL IR?
> (A way of doing, say, algebraic optimizations the way we do now?)

We don't have something like a hierarchical visitor in NIR, because it
isn't necessary any more - certainly one could be created, though.

Like I mentioned in the cover letter, we need one of two things to get
the information we got with the expression trees (and actually even
more):

1. Use-def chains (which definitions of this register can possibly
reach this use?) and def-use chains (of all the uses of this register,
which ones can be reached by this definition?)

2. SSA

With SSA, use-def chains and def-use chains are trivial because each
SSA value is defined only once: the use-def chain for each use is just
the one definition, and the def-use chain for each definition is just
the set of all uses, which we already keep track of. You can think of
expression trees as a special case of SSA, where each definition has
only one use.

I think the plan for NIR is to just do all our optimizations in SSA,
so we don't have to mess around with DU and UD chains at all.

One of my pie-in-the-sky ideas is make a language for doing
graph-rewriting where we can say things like "a * 1.0 => a" similar to
what LLVM has now, except it might get difficult with all the
swizzles, modifiers, etc. that NIR supports.

>
> With these two in place, it would be easy to make a general bounds-checking
> optimization to eliminate max/min/sin/sign/cos/ etc operations.
> I believe that we, as of now, do not have such a pass.

Well, I think Petri has made a pass like that for GLSL IR, you may
want to check it out - right now it only handles mins and maxes but
you should be able to extend it to other things as well.

Here's a sketch of how a bounds analysis pass for NIR in SSA would
probably work (just making this up now, I haven't worked out the
details):

- Create an array that for each SSA value gives its range,
initializing it to (-infinity, infinity) for each value expect for
ones defined by load_const instructions, the output of sin and cos
instructions, etc.
- Create a worklist of SSA values, initially putting in it only the
values that we didn't initialize to (-infinity, infinity).
- While the worklist isn't empty:
    - Grab a value off the worklist
    - For each use of the value that is an ALU instruction:
        - Re-evaluate the bounds of the value the instruction defines
        - If the bound is now tighter and the value isn't already on
the worklist, then put it in the worklist

(note, this will probably work similarly for lots of other analysis
passes, so it might be a good idea to abstract some of it out)

Then, once you have the results of the analysis, you can do things
like replacing all the uses of a max/min instruction with one of its
inputs, etc.

>
> If this IR lands, I could probably fing some time to port some
> of the optimization-passes from GLSL IR to NIR.

That would be cool. Once this stuff gets actually implemented, there's
probably going to be a lot of low-hanging fruit when it comes to
optimizations, especially since writing optimizations in SSA is so
easy!

Connor

>
> Regards,
> Thomas
>
> 2014-08-16 2:12 GMT+02:00 Connor Abbott <cwabbott0 at gmail.com>:
>> I know what you might be thinking right now. "Wait, *another* IR? Don't
>> we already have like 5 of those, not counting all the driver-specific
>> ones? Isn't this stuff complicated enough already?" Well, there are some
>> pretty good reasons to start afresh (again...). In the years we've been
>> using GLSL IR, we've come to realize that, in fact, it's not what we
>> want *at all* to do optimizations on. Ian has done a talk at FOSDEM that
>> highlights some of the problems they've run into:
>>
>> https://video.fosdem.org/2014/H1301_Cornil/Saturday/Three_Years_Experience_with_a_Treelike_Shader_IR.webm
>>
>> But here's the summary:
>>
>> * GLSL IR is way too much of a memory hog, since it has to make a new
>> variable for each temporary the compiler creates and then each time you
>> want to dereference that temporary you need to create an
>> ir_dereference_variable that points to it which is also very
>> cache-unfriendly ("downright cache-mean!").
>>
>> * The expression trees were originally added so that we could do
>> pattern matching to automatically optimize things, but this turned out
>> to be both very difficult to do and not very helpful. Instead, all it
>> does is add more complexity to the IR without much benefit - with SSA or
>> having proper use-def chains, we could get back what the trees give us
>> while also being able to do lots more optimizations.
>>
>> * We don't have the concept of basic blocks in GLSL IR, which makes a
>> lot of optimizations harder because they were originally designed with
>> basic blocks in mind - take, for example, my SSA series. I had to map a
>> whole lot of concepts that were based on the control flow graph to this
>> tree of statements that GLSL IR uses, and the end result wound up
>> looking nothing at all like the original paper. This problem gets even
>> worse for things like e.g. Global Code Motion that depend upon having
>> the dominance tree.
>>
>> I originally wanted to modify GLSL IR to fix these problems by adding
>> new instruction types that would address these issues and then
>> converting back and forth between the old and the new form, but I
>> realized that fixing all the problems would basically mean a complete
>> rewrite - and if that's the case, then why don't we start from scratch?
>> So I took Ken's suggestions and started designing, and then at Intel
>> over the summer started implementing, a completely new IR which I call
>> NIR that's at a lower level than GLSL IR, but still high-level enough to
>> be mostly device-independant (different drivers may have different
>> passes and different ways of lowering e.g.  matrix multiplies) so that
>> we can do generic optimizations on it. Having support for SSA from the
>> beginning was also a must, because lots of optimisations that we really
>> want for cleaning up DX9-translated games are either a lot easier in or
>> made possible by SSA. I also made the decision for it to be typeless,
>> because that's what the cool kids are all doing :) and for a
>> lower-level, flat IR it seemed like the thing to do (it could have gone
>> either way, though). So the key design points of NIR (pronounced either
>> like "near" as in "NIR is near!" or to rhyme with "burr") are:
>>
>> * It's flat (no expression trees)
>>
>> * It's typeless
>>
>> * Modifiers (abs, negate, saturate), swizzles, and write masks are part
>> of ALU instructions
>>
>> * It includes enough GLSL-like things (variables that you can load from
>> or store to, function calls) to be hardware-agnostic (although we don't
>> have a way to represent matrix multiplies right now, but that could
>> easily be added) to be able to do optimizations at a high level, while
>> having lowering passes that convert variables to registers and
>> input/output/uniform loads/stores that will open up more opportunities
>> for optimization and save memory while being more hardware-specific.
>>
>> * Control flow consists of a tree of if statements and loops, like in
>> GLSL IR, except the leaves of the tree are now basic blocks instead of
>> instructions. Also, each basic block keeps track of its successors and
>> predecessors, so the control flow graph is explicit in the IR.
>>
>> * SSA is natively supported, and SSA uses point directly to the SSA
>> definition, which means that the use-def chains are always there, and
>> def-use chains are kept by tracking the set of all uses for each
>> definition.
>>
>> * It's written in C.
>>
>> (see the README in patch 3 and nir.h in patch 4 for more details)
>>
>> Some things that are missing or could be improved:
>>
>> * There's currently no alias tracking for inputs, outputs, and uniforms.
>> This is especially important for uniforms because we don't pack them
>> like we pack inputs and outputs.
>>
>> * We need a way to represent matrix multiplies so that we can do
>> matrix-flipping optimizations in NIR (currently GLSL IR does this for
>> us).
>>
>> * I'm not entirely happy about how we represent loads and stores in the
>> IR. Right now, they're intrinsics, but that means we need a different
>> intrinsic for each size and combination of arguments (indirect vs. not
>> indirect, etc.) and we might run into a combinatorial explosion problem
>> in the future, so we might need to make separate load/store instructions
>> like what I did for textures.
>>
>> * Right now, we only have a pass that lowers variables for scalar
>> backends. We need to write a similar pass for vector backends that uses
>> std140 packing or something similar, as well as porting
>> lower_ubo_reference to NIR and changing it to output offsets in the
>> hardware-native units instead of bytes.
>>
>> * We'll need to write a pass that splits up vector expressions for
>> scalar backends.
>>
>> The first two patches are preperatory patches that I already sent to the
>> list, but I'm re-sending them as part of the series as they haven't
>> landed yet. Right now, the series only has code to convert GLSL IR to
>> NIR, but no way to actually hook it up to a backend in order to generate
>> code from it, and it also doesn't do anything with the SSA part of the
>> IR. I have a branch on my Github that does the conversion to SSA and a
>> few simple SSA-based optimizations, which hasn't been tested as much
>> (since I haven't written a pass to get out of SSA or a backend that uses
>> SSA):
>>
>> https://github.com/cwabbott0/mesa/tree/nir
>>
>> and an experimental backend for i965 fs that I hope to combine with
>> Matt's SSA work; right now, there are only a few piglit regressions and
>> most of them are because of the hacky way I changed boolean true to be
>> 0xFFFFFF instead of 1 (with Matt's series to do the same thing in a
>> better way, they should go away) or because of unimplemented features
>> (atomics and some system values):
>>
>> https://github.com/cwabbott0/mesa/tree/nir-i965-fs
>>
>> NIR has been what I've worked on for my entire summer internship at
>> Intel, and before I go off to my freshman year at college, I'd like to
>> thank the other Intel folks for the knowledge they've given me and the
>> many interesting discussions that made this go from an idea to a reality
>> - I'll miss you guys!
>>
>> Connor
>>
>> Connor Abbott (16):
>>   exec_list: add a list_foreach_typed_reverse() macro
>>   glsl/linker: pass through the is_intrinsic flag
>>   nir: add initial README
>>   nir: add a simple C wrapper around glsl_types.h
>>   nir: add the core datastructures
>>   nir: add core helper functions
>>   nir: add a printer
>>   nir: add a validation pass
>>   nir: add a glsl-to-nir pass
>>   nir: add a pass to lower variables for scalar backends
>>   nir: keep track of the number of input, output, and uniform slots
>>   nir: add a pass to remove unused variables
>>   nir: add a pass to lower sampler instructions
>>   nir: add a pass to lower system value reads
>>   nir: add a pass to lower atomics
>>   nir: add an optimization to turn global registers into local registers
>>
>>  src/glsl/Makefile.sources                 |   18 +-
>>  src/glsl/link_functions.cpp               |    2 +
>>  src/glsl/list.h                           |    6 +
>>  src/glsl/nir/README                       |  118 ++
>>  src/glsl/nir/glsl_to_nir.cpp              | 1759 +++++++++++++++++++++++++++++
>>  src/glsl/nir/glsl_to_nir.h                |   40 +
>>  src/glsl/nir/nir.c                        | 1717 ++++++++++++++++++++++++++++
>>  src/glsl/nir/nir.h                        | 1270 +++++++++++++++++++++
>>  src/glsl/nir/nir_intrinsics.c             |   49 +
>>  src/glsl/nir/nir_intrinsics.h             |  158 +++
>>  src/glsl/nir/nir_lower_atomics.c          |  127 +++
>>  src/glsl/nir/nir_lower_samplers.cpp       |  170 +++
>>  src/glsl/nir/nir_lower_system_values.c    |  106 ++
>>  src/glsl/nir/nir_lower_variables_scalar.c | 1243 ++++++++++++++++++++
>>  src/glsl/nir/nir_opcodes.c                |   46 +
>>  src/glsl/nir/nir_opcodes.h                |  346 ++++++
>>  src/glsl/nir/nir_opt_global_to_local.c    |  103 ++
>>  src/glsl/nir/nir_print.c                  |  916 +++++++++++++++
>>  src/glsl/nir/nir_remove_dead_variables.c  |  138 +++
>>  src/glsl/nir/nir_types.cpp                |  155 +++
>>  src/glsl/nir/nir_types.h                  |   78 ++
>>  src/glsl/nir/nir_validate.c               |  798 +++++++++++++
>>  22 files changed, 9362 insertions(+), 1 deletion(-)
>>  create mode 100644 src/glsl/nir/README
>>  create mode 100644 src/glsl/nir/glsl_to_nir.cpp
>>  create mode 100644 src/glsl/nir/glsl_to_nir.h
>>  create mode 100644 src/glsl/nir/nir.c
>>  create mode 100644 src/glsl/nir/nir.h
>>  create mode 100644 src/glsl/nir/nir_intrinsics.c
>>  create mode 100644 src/glsl/nir/nir_intrinsics.h
>>  create mode 100644 src/glsl/nir/nir_lower_atomics.c
>>  create mode 100644 src/glsl/nir/nir_lower_samplers.cpp
>>  create mode 100644 src/glsl/nir/nir_lower_system_values.c
>>  create mode 100644 src/glsl/nir/nir_lower_variables_scalar.c
>>  create mode 100644 src/glsl/nir/nir_opcodes.c
>>  create mode 100644 src/glsl/nir/nir_opcodes.h
>>  create mode 100644 src/glsl/nir/nir_opt_global_to_local.c
>>  create mode 100644 src/glsl/nir/nir_print.c
>>  create mode 100644 src/glsl/nir/nir_remove_dead_variables.c
>>  create mode 100644 src/glsl/nir/nir_types.cpp
>>  create mode 100644 src/glsl/nir/nir_types.h
>>  create mode 100644 src/glsl/nir/nir_validate.c
>>
>> --
>> 1.9.3
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev