[Mesa-dev] [PATCH 000/123] Reintroducing NIR, a new IR for mesa

Mon Dec 15 22:04:10 PST 2014

NIR (pronounced "ner") is a new IR (internal representation) for the Mesa
shader compiler that will sit between the old IR (GLSL IR) and back-end
compilers.  The primary purpose of NIR is to be more efficient for doing
optimizations and generate better code for the back-ends.  We have a lot of
optimizations implemented in GLSL IR right now.  However, they still
generate fairly bad code primarily because its tree-based structure makes
writing good optimizations difficult.  For this reason, we have implemented
a lot of optimizations in the i965 back-end compilers just to fix up the
code we get from GLSL IR.  The "proper fix" to this is to implement a
better high-level IR; enter NIR.

Most of the initial work on NIR including setting up common data
structures, helper methods, and a few basic passes was by Connor Abbot who
interned with us over the summer.  Connor did a fantastic job, but there is
still a lot left to be done.  I've spent the last two months trying to fill
in the pieces that we need in order to get NIR off the ground.  At this
point, we now have compitent in and out of SSA passes, are at zero piglit
regressions for i965 SIMD8 fragment shaders, and the shader-db numbers
aren't terrible.

This is still a bit experimental.  I have been testing only on HSW but it
should work ok on SNB and later.  Eventually, once we get booleans fixed
up, it should work fine on older chips as well.  It also doesn't yet
support SIMD16, so performance won't be that great.  That said, I think we
are at the point now where we should try and land this and I can stop
developing in my masive private branch.  Since this isn't quite ready for
prime-time yet, using it requires setting the INTEL_USE_NIR environment
variable.

A few key points about NIR:

 1. It is primarily an SSA-based IR.
 2. It supports source/destination-modifiers and swizzles/*write-masks.
 3. Standard GPU operations such as sin() and fmad() are first-class ALU
    operations, not intrinsics.
 4. GLSL concepts like inputs, outputs, uniforms, etc. are built into the
    IR so we can do proper analysis on them.
 5. Even though it's SSA, it still has a concept of registers and
    write-masks in the core IR data structures.  This means we can generate
    code that is much closer to what backends want.
 6. Control flow is structured explicitly in the IR.

(*write-masks are not available for SSA values)

While source/destination modifiers and writemasks/swizzles are not
particularly useful for optimizations, having them represented in the IR
gives us the ability to generate more useful code for backends.

A few notes about review:

 1. For those of you who aren't interested in the general compiler, I'm
    sorry for the patch-bomb.  However, several people have requsted that
    we maintain the history of the NIR development since connor's original
    drop at the end of the summer.  Therefore, while I've squashed several
    things, I've tried to leave the diff of what I've done more-or-less
    preserved.

 2. No, this is not LLVM.  There was a long-winded discussion about that
    when Connor dropped his patches that went a whole lot of nowhere as
    usual.  I would really prefer if we left that debate alone.  If there
    must be bikeshedding on the topic, please do so on the cover-letter
    e-mail.

 3. Please keep all bikeshedding about C++, typedefs, etc.  on the core
    datastructures e-mail.  If we need, we can split that off in its own
    thread.

 4. While I welcome review, I don't plan to make non-trivial changes to
    specific patches or squash anything beyond what has already been
    squashed.  I've tried thus far to more-or-less keep the history and I'd
    like to continue this if we can.

 5. Eric Anholt has also written NIR -> TGSI -> NIR passes which will
    hopefully get landed soon after NIR initially lands.  Exactly how that
    all gets hooked up for other gallium drivers beyond vc4 is outside the
    scope of this series.

I have pushed a branch to my personal freedesktop.org account.  For certain
types of review, it may be easier to look at the end result rather than the
patches.  The branch can be found via freedesktop cgit here:

http://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/nir-v1

Last week, I did a presentation for some of the other Intel people to try
and help bring them up to speed on NIR concepts quickly.  As part of this,
I typed up a bunch of notes that provide a decent overview of a lot of NIR
concepts.  Those notes can be found here:

http://www.jlekstrand.net/jason/projects/mesa/nir-notes/

Happy reviewing!

P.S. Connor, Don't do too much reviewing before your finals are done. :-P

Connor Abbott (22):
  exec_list: add a list_foreach_typed_reverse() macro
  nir: add initial README
  nir: add a simple C wrapper around glsl_types.h
  nir: add the core datastructures
  nir: add core helper functions
  nir: add a printer
  nir: add a validation pass
  nir: add a glsl-to-nir pass
  nir: add a pass to lower variables for scalar backends
  nir: keep track of the number of input, output, and uniform slots
  nir: add a pass to remove unused variables
  nir: add a pass to lower sampler instructions
  nir: add a pass to lower system value reads
  nir: add a pass to lower atomics
  nir: add an optimization to turn global registers into local registers
  nir: calculate dominance information
  nir: add a pass to convert to SSA
  nir: add an SSA-based copy propagation pass
  nir: add an SSA-based dead code elimination pass
  i965/fs: make emit_fragcoord_interpolation() more general
  i965/fs: Don't pass through the coordinate type
  i965/fs: add a NIR frontend

Jason Ekstrand (101):
  i965/fs: Only use nir for 8-wide non-fast-clear shaders.
  i965/fs_nir: Make the sampler register always unsigned
  i965/fs_nir: Use the correct types for texture inputs
  i965/fs_nir: Use the correct texture offset immediate
  Fix what I think are a few NIR typos
  Fix up varying pull constants
  i965/fs_nir: Add support for sample_pos and sample_id
  nir/glsl: Add support for saturate
  nir: Add fine and coarse derivative opcodes
  nir/glsl: Add support for coarse and fine derivatives
  i965/fs_nir: Handle coarse/fine derivatives
  nir/lower_atomics: Multiply array offsets by ATOMIC_COUNTER_SIZE
  i965/fs_nir: Add atomic counters support
  i965/fs: Allow reinterpretation in constant propagation
  nir: Add NIR_TRUE and NIR_FALSE constants and use them for boolean
    immediates
  nir: Add intrinsics to do alternate interpolation on inputs
  i965/fs: Don't take an ir_variable for emit_general_interpolation
  i965/fs_nir: Don't duplicate emit_general_interpolation
  nir: Add a naieve from-SSA pass
  nir: Add a lower_vec_to_movs pass
  i965/fs_nir: Convert the shader to/from SSA
  nir/lower_variables_scalar: Silence a compiler warning
  nir: Add a basic metadata management system
  nir: Add an assert
  nir/foreach_block: Return false if the callback on the last block
    fails
  nir: Add a foreach_block_reverse function
  nir: Add a function to detect if a block is immediately followed by an
    if
  nir: Make the nir_index_* functions return the nuber of items
  nir: Add an SSA-based liveness analysis pass.
  nir: Add an initialization function for SSA definitions
  nir: Automatically handle SSA uses when an instruction is inserted
  nir: Add a function for rewriting all the uses of a SSA def
  nir: Add a parallel copy instruction type
  nir: Add a function for comparing two sources
  nir: Add a better out-of-SSA pass
  i965/fs_nir: Do retyping for ALU srouces in get_nir_alu_src
  glsl/list: Fix the exec_list_validate function
  nir: Validate all lists in the validator
  nir/print: Don't reindex things
  nir: Differentiate between signed and unsigned versions of find_msb
  i965/fs_nir: Validate optimization passes
  nir/nir: Fix a bug in move_successors
  glsl/list: Add a foreach_list_typed_safe_reverse macro
  nir/nir: Use safe iterators when iterating over the CFG
  nir/nir: Patch up phi predecessors in move_successors
  nir: Add a peephole select optimization
  i965/fs_nir: Turn on the peephole select optimization
  nir: Validate that the SSA def and register indices are unique
  nir: Add a fused multiply-add peephole
  nir: Add a basic CSE pass
  i965/fs_nir: Add the CSE pass and actually run in a loop
  i965/fs_nir: Use an array rather than a hash table for register lookup
  i965/fs_nir: Handle SSA constants
  i965/fs_nir: Properly saturate multiplies
  nir: Add a helper for rewriting an instruction source
  nir/lower_samplers: Use the nir_instr_rewrite_src function
  nir: Clean up nir_deref helper functions
  nir: Make array deref direct vs. indirect an enum
  nir: Add a concept of a wildcard array dereference
  nir: Use an integer index for specifying structure fields
  nir: Don't require a function in ssa_def_init
  nir/copy_propagate: Don't cause size mismatches on phi node sources
  nir: Validate that the sources of a phi have the same size as the
    destination
  nir/glsl: Don't allocate a state_slots array for 0 state slots
  i965/fs_nir: Don't dump the shader.
  nir: Use the enum for the variable mode
  nir: Automatically update SSA if uses
  nir: Add a copy splitting pass
  nir: Add a pass to lower local variable accesses to SSA values
  nir: Add a pass to lower local variables to registers
  nir: Add a pass for lowering input/output loads/stores
  nir: Add a pass to lower global variables to local variables
  nir/glsl: Generate SSA NIR
  i965/fs_nir: Use the new variable lowering code
  nir/validate: Ensure that outputs are write-only and inputs are
    read-only
  nir: Remove the old variable lowering code
  nir: Vectorize intrinsics
  nir/validate: Validate intrinsic source/destination sizes
  nir: Add gpu_shader5 interpolation intrinsics
  nir/glsl: Add support for gpu_shader5 interpolation instrinsics
  nir: Add a helper for getting a constant value from an SSA source
  i965/fs_nir: Add a has_indirect flag and clean up some of the
    input/output code
  i965/fs_nir: Implement the ARB_gpu_shader5 interpolation intrinsics
  nir: Add neg, abs, and sat opcodes
  nir: Add a lowering pass for adding source modifiers where possible
  nir: Make the type casting operations static inline functions
  nir/glsl: Emit abs, neg, and sat operations instead of source
    modifiers
  nir: Add an expression matching framework
  nir: Add infastructure for generating algebraic transformation passes
  nir: Add an algebraic optimization pass
  nir: Add a basic constant folding pass
  nir: Remove the ffma peephole
  nir: Make texture instruction names more consistent
  nir: Constant fold array indirects
  nir: Use a source for uniform buffer indices instead of an index
  nir: Add a sampler index indirect to nir_tex_instr
  nir: Rework the way samplers are lowered
  i965/fs_nir: Add support for indirect texture arrays
  nir/metadata: Rename metadata_dirty to metadata_preserve
  nir: Call nir_metadata_preserve more places
  nir: Make bcsel a fully vector operation

 src/glsl/Makefile.am                               |   10 +-
 src/glsl/Makefile.sources                          |   39 +-
 src/glsl/list.h                                    |   19 +-
 src/glsl/nir/README                                |  118 ++
 src/glsl/nir/glsl_to_nir.cpp                       | 1825 +++++++++++++++++
 src/glsl/nir/glsl_to_nir.h                         |   40 +
 src/glsl/nir/nir.c                                 | 2042 ++++++++++++++++++++
 src/glsl/nir/nir.h                                 | 1433 ++++++++++++++
 src/glsl/nir/nir_algebraic.py                      |  249 +++
 src/glsl/nir/nir_dominance.c                       |  298 +++
 src/glsl/nir/nir_from_ssa.c                        |  859 ++++++++
 src/glsl/nir/nir_intrinsics.c                      |   49 +
 src/glsl/nir/nir_intrinsics.h                      |  140 ++
 src/glsl/nir/nir_live_variables.c                  |  282 +++
 src/glsl/nir/nir_lower_atomics.c                   |  146 ++
 src/glsl/nir/nir_lower_global_vars_to_local.c      |  107 +
 src/glsl/nir/nir_lower_io.c                        |  324 ++++
 src/glsl/nir/nir_lower_locals_to_regs.c            |  308 +++
 src/glsl/nir/nir_lower_samplers.cpp                |  181 ++
 src/glsl/nir/nir_lower_system_values.c             |  107 +
 src/glsl/nir/nir_lower_to_source_mods.c            |  181 ++
 src/glsl/nir/nir_lower_variables.c                 | 1046 ++++++++++
 src/glsl/nir/nir_lower_vec_to_movs.c               |   96 +
 src/glsl/nir/nir_metadata.c                        |   54 +
 src/glsl/nir/nir_opcodes.c                         |   46 +
 src/glsl/nir/nir_opcodes.h                         |  356 ++++
 src/glsl/nir/nir_opt_algebraic.py                  |   67 +
 src/glsl/nir/nir_opt_constant_folding.c            |  355 ++++
 src/glsl/nir/nir_opt_copy_propagate.c              |  325 ++++
 src/glsl/nir/nir_opt_cse.c                         |  269 +++
 src/glsl/nir/nir_opt_dce.c                         |  186 ++
 src/glsl/nir/nir_opt_global_to_local.c             |  103 +
 src/glsl/nir/nir_opt_peephole_select.c             |  214 ++
 src/glsl/nir/nir_print.c                           |  948 +++++++++
 src/glsl/nir/nir_remove_dead_variables.c           |  138 ++
 src/glsl/nir/nir_search.c                          |  337 ++++
 src/glsl/nir/nir_search.h                          |   80 +
 src/glsl/nir/nir_split_var_copies.c                |  225 +++
 src/glsl/nir/nir_to_ssa.c                          |  660 +++++++
 src/glsl/nir/nir_types.cpp                         |  143 ++
 src/glsl/nir/nir_types.h                           |   75 +
 src/glsl/nir/nir_validate.c                        |  912 +++++++++
 src/mesa/drivers/dri/i965/Makefile.sources         |    1 +
 src/mesa/drivers/dri/i965/brw_fs.cpp               |   74 +-
 src/mesa/drivers/dri/i965/brw_fs.h                 |   57 +-
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   |    4 +-
 src/mesa/drivers/dri/i965/brw_fs_fp.cpp            |   32 +-
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp           | 1778 +++++++++++++++++
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp       |   39 +-
 src/mesa/main/bitset.h                             |    1 +
 50 files changed, 17301 insertions(+), 77 deletions(-)
 create mode 100644 src/glsl/nir/README
 create mode 100644 src/glsl/nir/glsl_to_nir.cpp
 create mode 100644 src/glsl/nir/glsl_to_nir.h
 create mode 100644 src/glsl/nir/nir.c
 create mode 100644 src/glsl/nir/nir.h
 create mode 100644 src/glsl/nir/nir_algebraic.py
 create mode 100644 src/glsl/nir/nir_dominance.c
 create mode 100644 src/glsl/nir/nir_from_ssa.c
 create mode 100644 src/glsl/nir/nir_intrinsics.c
 create mode 100644 src/glsl/nir/nir_intrinsics.h
 create mode 100644 src/glsl/nir/nir_live_variables.c
 create mode 100644 src/glsl/nir/nir_lower_atomics.c
 create mode 100644 src/glsl/nir/nir_lower_global_vars_to_local.c
 create mode 100644 src/glsl/nir/nir_lower_io.c
 create mode 100644 src/glsl/nir/nir_lower_locals_to_regs.c
 create mode 100644 src/glsl/nir/nir_lower_samplers.cpp
 create mode 100644 src/glsl/nir/nir_lower_system_values.c
 create mode 100644 src/glsl/nir/nir_lower_to_source_mods.c
 create mode 100644 src/glsl/nir/nir_lower_variables.c
 create mode 100644 src/glsl/nir/nir_lower_vec_to_movs.c
 create mode 100644 src/glsl/nir/nir_metadata.c
 create mode 100644 src/glsl/nir/nir_opcodes.c
 create mode 100644 src/glsl/nir/nir_opcodes.h
 create mode 100644 src/glsl/nir/nir_opt_algebraic.py
 create mode 100644 src/glsl/nir/nir_opt_constant_folding.c
 create mode 100644 src/glsl/nir/nir_opt_copy_propagate.c
 create mode 100644 src/glsl/nir/nir_opt_cse.c
 create mode 100644 src/glsl/nir/nir_opt_dce.c
 create mode 100644 src/glsl/nir/nir_opt_global_to_local.c
 create mode 100644 src/glsl/nir/nir_opt_peephole_select.c
 create mode 100644 src/glsl/nir/nir_print.c
 create mode 100644 src/glsl/nir/nir_remove_dead_variables.c
 create mode 100644 src/glsl/nir/nir_search.c
 create mode 100644 src/glsl/nir/nir_search.h
 create mode 100644 src/glsl/nir/nir_split_var_copies.c
 create mode 100644 src/glsl/nir/nir_to_ssa.c
 create mode 100644 src/glsl/nir/nir_types.cpp
 create mode 100644 src/glsl/nir/nir_types.h
 create mode 100644 src/glsl/nir/nir_validate.c
 create mode 100644 src/mesa/drivers/dri/i965/brw_fs_nir.cpp

-- 
2.2.0