[Mesa-dev] intel: WIP: Support for using 16-bits for mediump

Tue Nov 6 06:30:08 UTC 2018

Here is a version 2 of adding support for 16-bit float instructions in
the shader compiler. Unlike the first version which did all the analysis
at glsl level here one adds the notion of precision to NIR variables and
does the analysis and precision lowering in NIR level.

This lives in: gitlab.freedesktop.org:tpohjola/mesa and branch fp16.

This is now mature enough to be able to use 16-bit precision for all
instructions except a few special cases for gfxbench trex and alu2.
(Unfortunately I'm not seeing any performance benefit. This is not
that surprising as I got to the same point with the glsl-based
solution and was able to measure the performance already back then).
Hence I thought it is time to share it.

While this is still work-in-progress I didn't want to flood the list
with the full set of patches but instead included the very last where
I try to outline the logic and its current shortcomings. There is also
a short list of TODO items.

In addition to those I need to examine couple of Intel specific
misrenderings. I haven't gotten that deep yet but it looks I'm missing
something with 16-bit inot and mad/mac lowered interpolation.
Unfortunately I get corrupted rendering only with hardware while
simulator is happy.

Mostly I'm afraid how to test all of this properly. I haven't written
any unit tests but that is high on my list. This is mostly because I've
been uncertain about my design choices. So far I've used shader
runner tests that I've written for specific cases. These are useful for
development purposes but don't bring much value for regression testing.

Alejandro Piñeiro (1):
  intel/compiler/fs: Use half_precision data_format on 16-bit fb writes

Jose Maria Casanova Crespo (2):
  intel/compiler/fs: Include support for RT data_format bit
  intel/compiler/disasm: Show half-precision data_format on rt_writes

Topi Pohjolainen (58):
  intel/compiler/fs: Set 16-bit sampler return format
  intel/compiler/disasm: Show half-precision for sampler messages
  intel/compiler/fs: Skip tex-inst early in conversion lowering
  intel/compiler/fs: Support for dumping 16-bit IMM values
  intel/compiler: Allow 16-bit math
  intel/compiler/fs: Add helpers for 16-bit null regs
  intel/compiler/fs: Use two SIMD8 instructions for 16-bit math
  intel/compiler/fs: Use 16-bit null dest with 16-bit math
  intel/compiler/fs: Use 16-bit null dest with 16-bit compare
  intel/compiler/fs: Add 16-bit type support for nir_if
  intel/compiler/eu: Prepare 3-src-op for 16-bit sources
  intel/compiler/eu: Prepare 3-src-op for 16-bit dst
  intel/compiler/eu: Allow 3-src-op with mixed precision (HF/F) sources
  intel/compiler/disasm: Print mixed precision 3-src types correctly
  intel/compiler/disasm: Print 16-bit IMM values
  intel/compiler/fs: Support for combining 16-bit immediates
  intel/compiler/fs: Set tex type for generator to flag fp16
  intel/compiler/fs: Use component_size() instead of open coded
  intel/compiler/fs: Add register padding support
  intel/compiler/fs: Pad 16-bit texture return payloads
  intel/compiler/fs: Pad 16-bit output (store/fb write) payloads
  intel/compiler/fs: Pad 16-bit nir vec* components into full reg
  intel/compiler/fs: Pad 16-bit nir intrinsic dest into full reg
  intel/compiler/fs: Pad 16-bit const loads into full regs
  intel/compiler/fs: Pad 16-bit load payload lowering
  nir: Lower also 16-bit lrp() if needed
  intel/compiler: Lower 16-bit lrp()
  nir: Recognize f232(f216(x)) as x
  nir: Recognize f216(f232(x)) as x
  nir: Store variable precision when translating from glsl
  glsl: Set default precision for builtin variables
  i965: Prepare uniform mapping for 16-bit values
  i965: Support for uploading 16-bit uniforms from 32-bit store
  intel/compiler/fs: WIP: Use 32-bit slots for 16-bit uniforms
  intel/compiler: Tell compiler if lower precision is supported
  nir: Add lowering pass for variables marked mediump
  nir: Add pass for deref precision lowering
  nir: Add pass for alu precision lowering
  nir: Add precision conversion for load/store_deref
  nir: Add precision conversion for sources of texturing ops
  nir: Don't set destination size 16 for booleans
  nir: Add precision lowering for texture samples
  nir: Add support for non-fixed precision
  nir: Don't try to alter precision of boolean sources
  nir: Add support for variable sized booleans
  nir: Add support for lowering phi precision
  intel/compiler/fs: Prepare alu dest type for 16-bit booleans
  nir: Add lowering pass setting 16-bit boolean destinations
  nir: Add lowering pass turning b2f(i2i32(x)) into b2f(x)
  nir: Adjust integer precision for alus operating with 16-bit srcs
  nir: Replace b2f(x) with b2f(i2i32(x)) for 16-bit x
  nir: Adjust precision for discard_if
  nir: Allow input varyings to be converted to lower precision
  nir: Replace 16-bit src[0] for bcsel i2i32(src[0])
  nir: Replace 16-bit nir_if condition with i2i32(condition)
  Revert "intel/compiler: fix 16-bit comparisons"
  intel/compiler: Hook in precision lowering pass
  nir: Document precision lowering pass

 src/compiler/Makefile.sources                 |   2 +
 src/compiler/glsl/glsl_symbol_table.cpp       |  20 +
 src/compiler/glsl/glsl_symbol_table.h         |   7 +
 src/compiler/glsl/glsl_to_nir.cpp             |   1 +
 src/compiler/nir/meson.build                  |   2 +
 src/compiler/nir/nir.h                        |  18 +
 src/compiler/nir/nir_lower_bool_size.c        | 120 +++
 src/compiler/nir/nir_lower_precision.cpp      | 820 ++++++++++++++++++
 src/compiler/nir/nir_opt_algebraic.py         |   5 +
 src/intel/blorp/blorp.c                       |   4 +-
 src/intel/compiler/brw_compiler.c             |   1 +
 src/intel/compiler/brw_disasm.c               |  28 +-
 src/intel/compiler/brw_eu.h                   |   3 +-
 src/intel/compiler/brw_eu_emit.c              |  83 +-
 src/intel/compiler/brw_fs.cpp                 |  68 +-
 src/intel/compiler/brw_fs.h                   |   4 +-
 src/intel/compiler/brw_fs_builder.h           |  37 +-
 .../compiler/brw_fs_combine_constants.cpp     |  84 +-
 .../compiler/brw_fs_copy_propagation.cpp      |   7 +-
 src/intel/compiler/brw_fs_generator.cpp       |  13 +-
 .../compiler/brw_fs_lower_conversions.cpp     |  42 +
 src/intel/compiler/brw_fs_nir.cpp             | 197 +++--
 src/intel/compiler/brw_fs_surface_builder.cpp |   3 +-
 src/intel/compiler/brw_fs_visitor.cpp         |   6 +
 src/intel/compiler/brw_inst.h                 |   5 +
 src/intel/compiler/brw_ir_fs.h                |  16 +
 src/intel/compiler/brw_nir.c                  |  22 +-
 src/intel/compiler/brw_nir.h                  |   4 +-
 src/intel/compiler/brw_reg_type.c             |   2 +
 src/intel/compiler/brw_shader.h               |   7 +
 src/intel/vulkan/anv_pipeline.c               |   2 +-
 .../drivers/dri/i965/brw_nir_uniforms.cpp     |   8 +-
 src/mesa/drivers/dri/i965/brw_program.c       |  10 +-
 src/mesa/drivers/dri/i965/brw_program.h       |   6 +-
 src/mesa/drivers/dri/i965/brw_tcs.c           |   2 +-
 .../drivers/dri/i965/gen6_constant_state.c    |  14 +-
 36 files changed, 1548 insertions(+), 125 deletions(-)
 create mode 100644 src/compiler/nir/nir_lower_bool_size.c
 create mode 100644 src/compiler/nir/nir_lower_precision.cpp

-- 
2.17.1