[Mesa-dev] [PATCH 00/31] nir: Use a 1-bit data type for booleans

Mon Oct 22 22:13:31 UTC 2018

This is something that Connor and I have talked about quite a bit over the
last couple of months.  The core idea is to replace NIR's current 32-bit
0/-1 D3D10-style booleans with a 1-bit data type.  All in all, I think it
worked out pretty well though I really don't like the proliferation of
32-bit comparison opcodes we now have kicking around for i965.

Why?  No hardware really has a 1-bit type, right?  Well, sort of...  AMD
actually uses 64-bit scalars for booleans with one bit per invocation.
However, most hardware such as Intel uses some other larger value for
booleans.  The real benefit of 1-bit booleans and requiring a lowering pass
is that you can do somewhat custom lowering (like AMD wants) and your
lowering pass can always tell in an instant if a value is a boolean based
on the bit size.  As can be seen in the last patch, this makes it really
easy to implement a bool -> float lowering pass for hardware that doesn't
have real integers where NIR's current booleans are actually rather
painful.

On Intel, the situation is a bit more complicated.  It's tempting to say
that we have 32-bit D3D10 booleans.  However, they're not really D3D10
booleans on gen4-5 because the top 31 bits are undefined garbage and, while
iand, ior, ixor, and inot operations work, you have to iand with 1 at the
last minute to clear off all that garbage.  Also, on all generations, a
comparison of two N-bit values results in an N-bit boolean, not a 32-bit
bool.  This has caused the Igalia folks no end of trouble as they've been
working on native 8 and 16-bit support.  If, instead, we have a 1-bit bool
with a lowering pass and we can lower to whatever we want, then we could
lower to a set of comparison opcodes that return the same bit-size as they
compare and it would match GEN hardware much better.

But what about performance?  Aren't there all sorts of neat tricks we can
do with D3D10 booleans like b & 1.0f for b2f?  As it turns out, not really;
that's about the only one.  There is some small advantage when optimizing
shaders that come from D3D if your native representation of booleans
matches that of D3D.  However, penultimate patch in this series adds a few
small optimizations that get us to actually better than we were before.
With the entire series, shader-db on Kaby Lak looks like this:

    total instructions in shared programs: 15084098 -> 14988578 (-0.63%)
    instructions in affected programs: 1321114 -> 1225594 (-7.23%)
    helped: 2340
    HURT: 23

    total cycles in shared programs: 369790134 -> 359798399 (-2.70%)
    cycles in affected programs: 134085452 -> 124093717 (-7.45%)
    helped: 2149
    HURT: 720

    total loops in shared programs: 4393 -> 4393 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total spills in shared programs: 10158 -> 10051 (-1.05%)
    spills in affected programs: 1429 -> 1322 (-7.49%)
    helped: 8
    HURT: 15

    total fills in shared programs: 22105 -> 21720 (-1.74%)
    fills in affected programs: 2853 -> 2468 (-13.49%)
    helped: 12
    HURT: 15

How about ease of use?  Are they a pain to deal with?  Yes, adding support
for 1-bit types was a bit awkward in a few places but most of it was
dealing with all the places where we have 32-bit booleans baked into
assumptions.  Getting rid of that baking in solves the problem and also
makes the whole IR more future-proof.

All in all, I'd say I'm pretty happy with it.  However, I'd like other
people (particularly the AMD folks) to play with it a bit and verify that
it solves their problems as well.  Also, I added a lowering pass and tried
to turn it on in everyone's driver but may not have put it in the right
spot.  Please double-check my work.  For those wishing to take a look, you
can also find the entire series on my gitlab here:

https://gitlab.freedesktop.org/jekstrand/mesa/commits/review/nir-1-bit-bool

Please review!

--Jason

Cc: Connor Abbott <cwabbott0 at gmail.com>
Cc: Timothy Arceri <tarceri at itsqueeze.com>
Cc: Eric Anholt <eric at anholt.net>
Cc: Rob Clark <robdclark at gmail.com>
Cc: Karol Herbst <karolherbst at gmail.com>
Cc: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
Cc: Alyssa Rosenzweig <alyssa at rosenzweig.io>

Jason Ekstrand (31):
  nir/validate: Print when the validation failed
  nir/constant_folding: Add an unreachable to a switch
  nir/constant_folding: Use nir_src_as_bool for discard_if
  nir/builder: Add a nir_imm_true/false helpers
  nir/builder: Handle 16-bit floats in nir_imm_floatN_t
  nir/search: Use nir_builder
  nir/opt_if: Rework condition propagation
  nir/system_values: Use the bit size from the load_deref
  glsl/nir: Use i2b instead of ine for fixing UBO/SSBO booleans
  nir/prog: Use nir_bany in kill handling
  spirv: Use the right bit-size for spec constant ops
  spirv: Initialize subgroup destinations with the destination type
  intel/nir: Use the OPT macro for more passes
  nir/algebraic: Disable b2f lowering and two optimizations
  nir: Rename boolean-related opcodes to include 32 in the name
  FIXUP: nir/builder: Generate 32-bit bool opcodes transparently
  FIXUP: nir/algebraic: Remap boolean opcodes to the 32-bit variant
  FIXUP: Use 32-bit opcodes in the NIR producers and optimizations
  FIXUP: Use 32-bit opcodes in the NIR back-ends
  nir/constant_expressions: Rework boolean handling
  nir: Add support for 1-bit data types
  nir: Add 1-bit boolean opcodes
  nir: Add a bool to int32 lowering pass
  glsl,spirv: Generate 1-bit booleans
  FIXUP: Revert "Use 32-bit opcodes in the NIR producers and
    optimizations"
  FIXUP: Revert "nir/builder: Generate 32-bit bool opcodes
    transparently"
  FIXUP: nir/builder: Generate 1-bit booleans in nir_build_imm_bool
  nir/algebraic: Optimize 1-bit booleans
  nir/algebraic: Add back lower_b2f and a b2f optimization
  nir/algebraic: Add some optimizations for D3D-style booleans
  nir: Add a bool to float32 lowering pass

 src/amd/common/ac_nir_to_llvm.c               |  30 +--
 src/amd/vulkan/radv_shader.c                  |   6 +-
 src/broadcom/compiler/nir_to_vir.c            |  44 ++---
 src/broadcom/compiler/vir.c                   |   2 +
 src/compiler/Makefile.sources                 |   2 +
 src/compiler/glsl/glsl_to_nir.cpp             |  28 +--
 src/compiler/nir/meson.build                  |   2 +
 src/compiler/nir/nir.c                        |  15 +-
 src/compiler/nir/nir.h                        |  36 ++--
 src/compiler/nir/nir_algebraic.py             |  22 ++-
 src/compiler/nir/nir_builder.h                |  51 ++++-
 src/compiler/nir/nir_constant_expressions.py  |  53 ++---
 src/compiler/nir/nir_instr_set.c              |  23 ++-
 src/compiler/nir/nir_lower_alu_to_scalar.c    |   4 +
 src/compiler/nir/nir_lower_bool_to_float.c    | 181 ++++++++++++++++++
 src/compiler/nir/nir_lower_bool_to_int32.c    | 162 ++++++++++++++++
 src/compiler/nir/nir_lower_int64.c            |   2 +-
 .../nir/nir_lower_load_const_to_scalar.c      |   3 +
 src/compiler/nir/nir_lower_returns.c          |   4 +-
 src/compiler/nir/nir_lower_subgroups.c        |   2 +-
 src/compiler/nir/nir_lower_system_values.c    |   1 +
 src/compiler/nir/nir_opcodes.py               |  31 ++-
 src/compiler/nir/nir_opt_algebraic.py         |  19 +-
 src/compiler/nir/nir_opt_constant_folding.c   |  18 +-
 src/compiler/nir/nir_opt_if.c                 |  91 +++------
 src/compiler/nir/nir_opt_intrinsics.c         |   2 +-
 src/compiler/nir/nir_opt_large_constants.c    |   5 +
 src/compiler/nir/nir_print.c                  |   3 +
 src/compiler/nir/nir_search.c                 | 112 +++--------
 src/compiler/nir/nir_search.h                 |   9 +-
 src/compiler/nir/nir_validate.c               |  16 +-
 src/compiler/nir/tests/vars_tests.cpp         |  38 ++--
 src/compiler/nir_types.cpp                    |   2 +-
 src/compiler/nir_types.h                      |   4 +-
 src/compiler/spirv/spirv_to_nir.c             |  23 ++-
 src/compiler/spirv/vtn_alu.c                  |   4 +-
 src/compiler/spirv/vtn_cfg.c                  |  10 +-
 src/compiler/spirv/vtn_subgroup.c             |  12 +-
 src/gallium/auxiliary/nir/tgsi_to_nir.c       |  20 +-
 .../drivers/freedreno/ir3/ir3_compiler_nir.c  |  31 +--
 src/gallium/drivers/radeonsi/si_shader_nir.c  |   2 +
 src/gallium/drivers/vc4/vc4_program.c         |  50 ++---
 src/intel/compiler/brw_fs_nir.cpp             |  80 ++++----
 src/intel/compiler/brw_nir.c                  |  16 +-
 .../brw_nir_analyze_boolean_resolves.c        |  24 +--
 .../compiler/brw_nir_lower_image_load_store.c |   2 +-
 src/intel/compiler/brw_vec4_nir.cpp           | 122 ++++++------
 src/intel/vulkan/anv_pipeline.c               |   2 +-
 src/mesa/drivers/dri/i965/brw_program.c       |   4 +-
 src/mesa/main/glspirv.c                       |   2 +-
 src/mesa/program/prog_to_nir.c                |   2 +-
 src/mesa/state_tracker/st_glsl_to_nir.cpp     |   2 +-
 52 files changed, 934 insertions(+), 497 deletions(-)
 create mode 100644 src/compiler/nir/nir_lower_bool_to_float.c
 create mode 100644 src/compiler/nir/nir_lower_bool_to_int32.c

-- 
2.19.1