[Mesa-dev] intel: WIP: Support for using 16-bits for mediump

Tue Nov 6 09:45:52 UTC 2018

As far as I understand, mediump handling can be split into two parts:

1. Figuring out which operations (instructions or SSA values in NIR)
can use relaxed precision.
2. Deciding which relaxed-precision operations to actually compute in
16-bit precision.

At least for GLSL, #1 is pretty well nailed down by the GLSL spec,
where it's specified in terms of the source expressions. For example,
something like:

mediump float a = ...;
mediump float b = ...;
float c = a + b;
float d = c + 2.0;

the last addition must be performed in full precision, whereas for:

mediump float a = ...;
mediump float b = ...;
float d = (a + b) + 2.0;

it can be lowered to 16-bit. This information gets lost during
expression grafting in GLSL IR, or vars-to-SSA in NIR, and even the
AST -> GLSL IR transform will sometimes split up expressions, so it
seems like both are too low-level for this. The analysis described by
the spec (the paragraph in section 4.7.3 "Precision Qualifiers" of the
GLSL ES 3.20 spec) has to happen on the AST after type checking but
before lowering to GLSL IR in order to be correct and not overly
conservative. If you want to do it in NIR since #2 is easier with SSA,
then sure... but we can't mix them up and do both at the same time.
We'll have to add support for annotating ir_expression's and nir_instr
(or maybe nir_ssa_def's) with a relaxed precision, and filter that
information down through the pipeline. Hopefully that also works
better for SPIR-V, where you can annotate individual instructions as
being RelaxedPrecision, and afaik (hopefully) #1 is handled by
glslang.

On Tue, Nov 6, 2018 at 7:30 AM Topi Pohjolainen
<topi.pohjolainen at gmail.com> wrote:
>
> Here is a version 2 of adding support for 16-bit float instructions in
> the shader compiler. Unlike the first version which did all the analysis
> at glsl level here one adds the notion of precision to NIR variables and
> does the analysis and precision lowering in NIR level.
>
> This lives in: gitlab.freedesktop.org:tpohjola/mesa and branch fp16.
>
> This is now mature enough to be able to use 16-bit precision for all
> instructions except a few special cases for gfxbench trex and alu2.
> (Unfortunately I'm not seeing any performance benefit. This is not
> that surprising as I got to the same point with the glsl-based
> solution and was able to measure the performance already back then).
> Hence I thought it is time to share it.
>
> While this is still work-in-progress I didn't want to flood the list
> with the full set of patches but instead included the very last where
> I try to outline the logic and its current shortcomings. There is also
> a short list of TODO items.
>
> In addition to those I need to examine couple of Intel specific
> misrenderings. I haven't gotten that deep yet but it looks I'm missing
> something with 16-bit inot and mad/mac lowered interpolation.
> Unfortunately I get corrupted rendering only with hardware while
> simulator is happy.
>
> Mostly I'm afraid how to test all of this properly. I haven't written
> any unit tests but that is high on my list. This is mostly because I've
> been uncertain about my design choices. So far I've used shader
> runner tests that I've written for specific cases. These are useful for
> development purposes but don't bring much value for regression testing.
>
> Alejandro Piñeiro (1):
>   intel/compiler/fs: Use half_precision data_format on 16-bit fb writes
>
> Jose Maria Casanova Crespo (2):
>   intel/compiler/fs: Include support for RT data_format bit
>   intel/compiler/disasm: Show half-precision data_format on rt_writes
>
> Topi Pohjolainen (58):
>   intel/compiler/fs: Set 16-bit sampler return format
>   intel/compiler/disasm: Show half-precision for sampler messages
>   intel/compiler/fs: Skip tex-inst early in conversion lowering
>   intel/compiler/fs: Support for dumping 16-bit IMM values
>   intel/compiler: Allow 16-bit math
>   intel/compiler/fs: Add helpers for 16-bit null regs
>   intel/compiler/fs: Use two SIMD8 instructions for 16-bit math
>   intel/compiler/fs: Use 16-bit null dest with 16-bit math
>   intel/compiler/fs: Use 16-bit null dest with 16-bit compare
>   intel/compiler/fs: Add 16-bit type support for nir_if
>   intel/compiler/eu: Prepare 3-src-op for 16-bit sources
>   intel/compiler/eu: Prepare 3-src-op for 16-bit dst
>   intel/compiler/eu: Allow 3-src-op with mixed precision (HF/F) sources
>   intel/compiler/disasm: Print mixed precision 3-src types correctly
>   intel/compiler/disasm: Print 16-bit IMM values
>   intel/compiler/fs: Support for combining 16-bit immediates
>   intel/compiler/fs: Set tex type for generator to flag fp16
>   intel/compiler/fs: Use component_size() instead of open coded
>   intel/compiler/fs: Add register padding support
>   intel/compiler/fs: Pad 16-bit texture return payloads
>   intel/compiler/fs: Pad 16-bit output (store/fb write) payloads
>   intel/compiler/fs: Pad 16-bit nir vec* components into full reg
>   intel/compiler/fs: Pad 16-bit nir intrinsic dest into full reg
>   intel/compiler/fs: Pad 16-bit const loads into full regs
>   intel/compiler/fs: Pad 16-bit load payload lowering
>   nir: Lower also 16-bit lrp() if needed
>   intel/compiler: Lower 16-bit lrp()
>   nir: Recognize f232(f216(x)) as x
>   nir: Recognize f216(f232(x)) as x
>   nir: Store variable precision when translating from glsl
>   glsl: Set default precision for builtin variables
>   i965: Prepare uniform mapping for 16-bit values
>   i965: Support for uploading 16-bit uniforms from 32-bit store
>   intel/compiler/fs: WIP: Use 32-bit slots for 16-bit uniforms
>   intel/compiler: Tell compiler if lower precision is supported
>   nir: Add lowering pass for variables marked mediump
>   nir: Add pass for deref precision lowering
>   nir: Add pass for alu precision lowering
>   nir: Add precision conversion for load/store_deref
>   nir: Add precision conversion for sources of texturing ops
>   nir: Don't set destination size 16 for booleans
>   nir: Add precision lowering for texture samples
>   nir: Add support for non-fixed precision
>   nir: Don't try to alter precision of boolean sources
>   nir: Add support for variable sized booleans
>   nir: Add support for lowering phi precision
>   intel/compiler/fs: Prepare alu dest type for 16-bit booleans
>   nir: Add lowering pass setting 16-bit boolean destinations
>   nir: Add lowering pass turning b2f(i2i32(x)) into b2f(x)
>   nir: Adjust integer precision for alus operating with 16-bit srcs
>   nir: Replace b2f(x) with b2f(i2i32(x)) for 16-bit x
>   nir: Adjust precision for discard_if
>   nir: Allow input varyings to be converted to lower precision
>   nir: Replace 16-bit src[0] for bcsel i2i32(src[0])
>   nir: Replace 16-bit nir_if condition with i2i32(condition)
>   Revert "intel/compiler: fix 16-bit comparisons"
>   intel/compiler: Hook in precision lowering pass
>   nir: Document precision lowering pass
>
>  src/compiler/Makefile.sources                 |   2 +
>  src/compiler/glsl/glsl_symbol_table.cpp       |  20 +
>  src/compiler/glsl/glsl_symbol_table.h         |   7 +
>  src/compiler/glsl/glsl_to_nir.cpp             |   1 +
>  src/compiler/nir/meson.build                  |   2 +
>  src/compiler/nir/nir.h                        |  18 +
>  src/compiler/nir/nir_lower_bool_size.c        | 120 +++
>  src/compiler/nir/nir_lower_precision.cpp      | 820 ++++++++++++++++++
>  src/compiler/nir/nir_opt_algebraic.py         |   5 +
>  src/intel/blorp/blorp.c                       |   4 +-
>  src/intel/compiler/brw_compiler.c             |   1 +
>  src/intel/compiler/brw_disasm.c               |  28 +-
>  src/intel/compiler/brw_eu.h                   |   3 +-
>  src/intel/compiler/brw_eu_emit.c              |  83 +-
>  src/intel/compiler/brw_fs.cpp                 |  68 +-
>  src/intel/compiler/brw_fs.h                   |   4 +-
>  src/intel/compiler/brw_fs_builder.h           |  37 +-
>  .../compiler/brw_fs_combine_constants.cpp     |  84 +-
>  .../compiler/brw_fs_copy_propagation.cpp      |   7 +-
>  src/intel/compiler/brw_fs_generator.cpp       |  13 +-
>  .../compiler/brw_fs_lower_conversions.cpp     |  42 +
>  src/intel/compiler/brw_fs_nir.cpp             | 197 +++--
>  src/intel/compiler/brw_fs_surface_builder.cpp |   3 +-
>  src/intel/compiler/brw_fs_visitor.cpp         |   6 +
>  src/intel/compiler/brw_inst.h                 |   5 +
>  src/intel/compiler/brw_ir_fs.h                |  16 +
>  src/intel/compiler/brw_nir.c                  |  22 +-
>  src/intel/compiler/brw_nir.h                  |   4 +-
>  src/intel/compiler/brw_reg_type.c             |   2 +
>  src/intel/compiler/brw_shader.h               |   7 +
>  src/intel/vulkan/anv_pipeline.c               |   2 +-
>  .../drivers/dri/i965/brw_nir_uniforms.cpp     |   8 +-
>  src/mesa/drivers/dri/i965/brw_program.c       |  10 +-
>  src/mesa/drivers/dri/i965/brw_program.h       |   6 +-
>  src/mesa/drivers/dri/i965/brw_tcs.c           |   2 +-
>  .../drivers/dri/i965/gen6_constant_state.c    |  14 +-
>  36 files changed, 1548 insertions(+), 125 deletions(-)
>  create mode 100644 src/compiler/nir/nir_lower_bool_size.c
>  create mode 100644 src/compiler/nir/nir_lower_precision.cpp
>
> --
> 2.17.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev