[Mesa-dev] [RFC PATCH 00/16] A new IR for Mesa

Fri Aug 29 09:39:17 PDT 2014

After all the messages, some think that llvm is the solution.
And why is the Connor solution right ?

This is an very hard problem and some people want the easiest way out.
That is llvm.

I think we need the Connor in house approach.
I think we can have compiler experts, here.
If no one want to say it: mesa developers fear the compiler internals.

On Sat, Aug 16, 2014 at 3:12 AM, Connor Abbott <cwabbott0 at gmail.com> wrote:
> I know what you might be thinking right now. "Wait, *another* IR? Don't
> we already have like 5 of those, not counting all the driver-specific
> ones? Isn't this stuff complicated enough already?" Well, there are some
> pretty good reasons to start afresh (again...). In the years we've been
> using GLSL IR, we've come to realize that, in fact, it's not what we
> want *at all* to do optimizations on. Ian has done a talk at FOSDEM that
> highlights some of the problems they've run into:
>
> https://video.fosdem.org/2014/H1301_Cornil/Saturday/Three_Years_Experience_with_a_Treelike_Shader_IR.webm
>
> But here's the summary:
>
> * GLSL IR is way too much of a memory hog, since it has to make a new
> variable for each temporary the compiler creates and then each time you
> want to dereference that temporary you need to create an
> ir_dereference_variable that points to it which is also very
> cache-unfriendly ("downright cache-mean!").
>
> * The expression trees were originally added so that we could do
> pattern matching to automatically optimize things, but this turned out
> to be both very difficult to do and not very helpful. Instead, all it
> does is add more complexity to the IR without much benefit - with SSA or
> having proper use-def chains, we could get back what the trees give us
> while also being able to do lots more optimizations.
>
> * We don't have the concept of basic blocks in GLSL IR, which makes a
> lot of optimizations harder because they were originally designed with
> basic blocks in mind - take, for example, my SSA series. I had to map a
> whole lot of concepts that were based on the control flow graph to this
> tree of statements that GLSL IR uses, and the end result wound up
> looking nothing at all like the original paper. This problem gets even
> worse for things like e.g. Global Code Motion that depend upon having
> the dominance tree.
>
> I originally wanted to modify GLSL IR to fix these problems by adding
> new instruction types that would address these issues and then
> converting back and forth between the old and the new form, but I
> realized that fixing all the problems would basically mean a complete
> rewrite - and if that's the case, then why don't we start from scratch?
> So I took Ken's suggestions and started designing, and then at Intel
> over the summer started implementing, a completely new IR which I call
> NIR that's at a lower level than GLSL IR, but still high-level enough to
> be mostly device-independant (different drivers may have different
> passes and different ways of lowering e.g.  matrix multiplies) so that
> we can do generic optimizations on it. Having support for SSA from the
> beginning was also a must, because lots of optimisations that we really
> want for cleaning up DX9-translated games are either a lot easier in or
> made possible by SSA. I also made the decision for it to be typeless,
> because that's what the cool kids are all doing :) and for a
> lower-level, flat IR it seemed like the thing to do (it could have gone
> either way, though). So the key design points of NIR (pronounced either
> like "near" as in "NIR is near!" or to rhyme with "burr") are:
>
> * It's flat (no expression trees)
>
> * It's typeless
>
> * Modifiers (abs, negate, saturate), swizzles, and write masks are part
> of ALU instructions
>
> * It includes enough GLSL-like things (variables that you can load from
> or store to, function calls) to be hardware-agnostic (although we don't
> have a way to represent matrix multiplies right now, but that could
> easily be added) to be able to do optimizations at a high level, while
> having lowering passes that convert variables to registers and
> input/output/uniform loads/stores that will open up more opportunities
> for optimization and save memory while being more hardware-specific.
>
> * Control flow consists of a tree of if statements and loops, like in
> GLSL IR, except the leaves of the tree are now basic blocks instead of
> instructions. Also, each basic block keeps track of its successors and
> predecessors, so the control flow graph is explicit in the IR.
>
> * SSA is natively supported, and SSA uses point directly to the SSA
> definition, which means that the use-def chains are always there, and
> def-use chains are kept by tracking the set of all uses for each
> definition.
>
> * It's written in C.
>
> (see the README in patch 3 and nir.h in patch 4 for more details)
>
> Some things that are missing or could be improved:
>
> * There's currently no alias tracking for inputs, outputs, and uniforms.
> This is especially important for uniforms because we don't pack them
> like we pack inputs and outputs.
>
> * We need a way to represent matrix multiplies so that we can do
> matrix-flipping optimizations in NIR (currently GLSL IR does this for
> us).
>
> * I'm not entirely happy about how we represent loads and stores in the
> IR. Right now, they're intrinsics, but that means we need a different
> intrinsic for each size and combination of arguments (indirect vs. not
> indirect, etc.) and we might run into a combinatorial explosion problem
> in the future, so we might need to make separate load/store instructions
> like what I did for textures.
>
> * Right now, we only have a pass that lowers variables for scalar
> backends. We need to write a similar pass for vector backends that uses
> std140 packing or something similar, as well as porting
> lower_ubo_reference to NIR and changing it to output offsets in the
> hardware-native units instead of bytes.
>
> * We'll need to write a pass that splits up vector expressions for
> scalar backends.
>
> The first two patches are preperatory patches that I already sent to the
> list, but I'm re-sending them as part of the series as they haven't
> landed yet. Right now, the series only has code to convert GLSL IR to
> NIR, but no way to actually hook it up to a backend in order to generate
> code from it, and it also doesn't do anything with the SSA part of the
> IR. I have a branch on my Github that does the conversion to SSA and a
> few simple SSA-based optimizations, which hasn't been tested as much
> (since I haven't written a pass to get out of SSA or a backend that uses
> SSA):
>
> https://github.com/cwabbott0/mesa/tree/nir
>
> and an experimental backend for i965 fs that I hope to combine with
> Matt's SSA work; right now, there are only a few piglit regressions and
> most of them are because of the hacky way I changed boolean true to be
> 0xFFFFFF instead of 1 (with Matt's series to do the same thing in a
> better way, they should go away) or because of unimplemented features
> (atomics and some system values):
>
> https://github.com/cwabbott0/mesa/tree/nir-i965-fs
>
> NIR has been what I've worked on for my entire summer internship at
> Intel, and before I go off to my freshman year at college, I'd like to
> thank the other Intel folks for the knowledge they've given me and the
> many interesting discussions that made this go from an idea to a reality
> - I'll miss you guys!
>
> Connor
>
> Connor Abbott (16):
>   exec_list: add a list_foreach_typed_reverse() macro
>   glsl/linker: pass through the is_intrinsic flag
>   nir: add initial README
>   nir: add a simple C wrapper around glsl_types.h
>   nir: add the core datastructures
>   nir: add core helper functions
>   nir: add a printer
>   nir: add a validation pass
>   nir: add a glsl-to-nir pass
>   nir: add a pass to lower variables for scalar backends
>   nir: keep track of the number of input, output, and uniform slots
>   nir: add a pass to remove unused variables
>   nir: add a pass to lower sampler instructions
>   nir: add a pass to lower system value reads
>   nir: add a pass to lower atomics
>   nir: add an optimization to turn global registers into local registers
>
>  src/glsl/Makefile.sources                 |   18 +-
>  src/glsl/link_functions.cpp               |    2 +
>  src/glsl/list.h                           |    6 +
>  src/glsl/nir/README                       |  118 ++
>  src/glsl/nir/glsl_to_nir.cpp              | 1759 +++++++++++++++++++++++++++++
>  src/glsl/nir/glsl_to_nir.h                |   40 +
>  src/glsl/nir/nir.c                        | 1717 ++++++++++++++++++++++++++++
>  src/glsl/nir/nir.h                        | 1270 +++++++++++++++++++++
>  src/glsl/nir/nir_intrinsics.c             |   49 +
>  src/glsl/nir/nir_intrinsics.h             |  158 +++
>  src/glsl/nir/nir_lower_atomics.c          |  127 +++
>  src/glsl/nir/nir_lower_samplers.cpp       |  170 +++
>  src/glsl/nir/nir_lower_system_values.c    |  106 ++
>  src/glsl/nir/nir_lower_variables_scalar.c | 1243 ++++++++++++++++++++
>  src/glsl/nir/nir_opcodes.c                |   46 +
>  src/glsl/nir/nir_opcodes.h                |  346 ++++++
>  src/glsl/nir/nir_opt_global_to_local.c    |  103 ++
>  src/glsl/nir/nir_print.c                  |  916 +++++++++++++++
>  src/glsl/nir/nir_remove_dead_variables.c  |  138 +++
>  src/glsl/nir/nir_types.cpp                |  155 +++
>  src/glsl/nir/nir_types.h                  |   78 ++
>  src/glsl/nir/nir_validate.c               |  798 +++++++++++++
>  22 files changed, 9362 insertions(+), 1 deletion(-)
>  create mode 100644 src/glsl/nir/README
>  create mode 100644 src/glsl/nir/glsl_to_nir.cpp
>  create mode 100644 src/glsl/nir/glsl_to_nir.h
>  create mode 100644 src/glsl/nir/nir.c
>  create mode 100644 src/glsl/nir/nir.h
>  create mode 100644 src/glsl/nir/nir_intrinsics.c
>  create mode 100644 src/glsl/nir/nir_intrinsics.h
>  create mode 100644 src/glsl/nir/nir_lower_atomics.c
>  create mode 100644 src/glsl/nir/nir_lower_samplers.cpp
>  create mode 100644 src/glsl/nir/nir_lower_system_values.c
>  create mode 100644 src/glsl/nir/nir_lower_variables_scalar.c
>  create mode 100644 src/glsl/nir/nir_opcodes.c
>  create mode 100644 src/glsl/nir/nir_opcodes.h
>  create mode 100644 src/glsl/nir/nir_opt_global_to_local.c
>  create mode 100644 src/glsl/nir/nir_print.c
>  create mode 100644 src/glsl/nir/nir_remove_dead_variables.c
>  create mode 100644 src/glsl/nir/nir_types.cpp
>  create mode 100644 src/glsl/nir/nir_types.h
>  create mode 100644 src/glsl/nir/nir_validate.c
>
> --
> 1.9.3
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev