[Mesa-dev] intel: WIP: Support for using 16-bits for mediump

Tue Nov 6 11:46:20 UTC 2018

On Tue, Nov 6, 2018 at 11:31 AM Connor Abbott <cwabbott0 at gmail.com> wrote:
>
> On Tue, Nov 6, 2018 at 11:14 AM Pohjolainen, Topi
> <topi.pohjolainen at gmail.com> wrote:
> >
> > On Tue, Nov 06, 2018 at 10:45:52AM +0100, Connor Abbott wrote:
> > > As far as I understand, mediump handling can be split into two parts:
> > >
> > > 1. Figuring out which operations (instructions or SSA values in NIR)
> > > can use relaxed precision.
> > > 2. Deciding which relaxed-precision operations to actually compute in
> > > 16-bit precision.
> > >
> > > At least for GLSL, #1 is pretty well nailed down by the GLSL spec,
> > > where it's specified in terms of the source expressions. For example,
> > > something like:
> > >
> > > mediump float a = ...;
> > > mediump float b = ...;
> > > float c = a + b;
> > > float d = c + 2.0;
> > >
> > > the last addition must be performed in full precision, whereas for:
> > >
> > >
> > > mediump float a = ...;
> > > mediump float b = ...;
> > > float d = (a + b) + 2.0;
> > >
> > > it can be lowered to 16-bit. This information gets lost during
> > > expression grafting in GLSL IR, or vars-to-SSA in NIR, and even the
> > > AST -> GLSL IR transform will sometimes split up expressions, so it
> > > seems like both are too low-level for this. The analysis described by
> > > the spec (the paragraph in section 4.7.3 "Precision Qualifiers" of the
> > > GLSL ES 3.20 spec) has to happen on the AST after type checking but
> > > before lowering to GLSL IR in order to be correct and not overly
> > > conservative. If you want to do it in NIR since #2 is easier with SSA,
> > > then sure... but we can't mix them up and do both at the same time.
> > > We'll have to add support for annotating ir_expression's and nir_instr
> > > (or maybe nir_ssa_def's) with a relaxed precision, and filter that
> > > information down through the pipeline. Hopefully that also works
> > > better for SPIR-V, where you can annotate individual instructions as
> > > being RelaxedPrecision, and afaik (hopefully) #1 is handled by
> > > glslang.
> >
> > I tried to describe the logic I used and my interpretation of the spec in
> > the accompanying patch:
> >
> > https://lists.freedesktop.org/archives/mesa-dev/2018-November/208683.html
> >
> > Does it make any sense?
>
> It seems incorrect, since it will make the addition in my example
> operate in 16 bit precision when it shouldn't. As I explained above,
> it's impossible to do this correctly in NIR.

I digged a little more, and it seems like my interpretation of the
spec is also what glslang does, as well as at least one other
implementation (Mali). You can see the glslang implementation here:
https://github.com/KhronosGroup/glslang/blob/5ff3c3da3b374a03a5eff96544fcd6678ed575c1/glslang/MachineIndependent/Intermediate.cpp#L3024

I should clarify a little bit: it may be possible to implement the
GLSL rules for propagating precision qualifiers later, by annotating
every expression node/SSA def as either highp, mediump, or "could be
either." Then you can do something like what you're doing, propagating
things through and stopping when you hit a mediump or highp
expression. Every expression that defines a variable on the source
level must be either highp or mediump. But I think this makes the
changes to the rest of the optimization passes even more invasive,
since you still have to handle highp/mediump correctly everywhere, but
you also have to deal with the "could be either" case, so it doesn't
really help you. It's easier to just do it at the AST level, doing
something like what glslang does. This part is completely
target-independent -- only later, in a NIR pass, should we decide
whether to actually use fp16 while taking into account what the HW
actually can do.

>
> Also, abusing a 16-bit bitsize in NIR to mean mediump is not ok. There
> are other vulkan/glsl extensions out there that provide actual fp16
> support, where the result is guaranteed to be calculated as a
> half-float, and these obviously won't work properly with this pass. We
> need to add a flag to the SSA def, or Jason's idea a long time ago was
> to add a fake "24-bit" bitsize. Part of #2 will involve converting the
> bitsize to be 16-bit and removing the flag.
>
> >
> > >
> > >
> > > On Tue, Nov 6, 2018 at 7:30 AM Topi Pohjolainen
> > > <topi.pohjolainen at gmail.com> wrote:
> > > >
> > > > Here is a version 2 of adding support for 16-bit float instructions in
> > > > the shader compiler. Unlike the first version which did all the analysis
> > > > at glsl level here one adds the notion of precision to NIR variables and
> > > > does the analysis and precision lowering in NIR level.
> > > >
> > > > This lives in: gitlab.freedesktop.org:tpohjola/mesa and branch fp16.
> > > >
> > > > This is now mature enough to be able to use 16-bit precision for all
> > > > instructions except a few special cases for gfxbench trex and alu2.
> > > > (Unfortunately I'm not seeing any performance benefit. This is not
> > > > that surprising as I got to the same point with the glsl-based
> > > > solution and was able to measure the performance already back then).
> > > > Hence I thought it is time to share it.
> > > >
> > > > While this is still work-in-progress I didn't want to flood the list
> > > > with the full set of patches but instead included the very last where
> > > > I try to outline the logic and its current shortcomings. There is also
> > > > a short list of TODO items.
> > > >
> > > > In addition to those I need to examine couple of Intel specific
> > > > misrenderings. I haven't gotten that deep yet but it looks I'm missing
> > > > something with 16-bit inot and mad/mac lowered interpolation.
> > > > Unfortunately I get corrupted rendering only with hardware while
> > > > simulator is happy.
> > > >
> > > > Mostly I'm afraid how to test all of this properly. I haven't written
> > > > any unit tests but that is high on my list. This is mostly because I've
> > > > been uncertain about my design choices. So far I've used shader
> > > > runner tests that I've written for specific cases. These are useful for
> > > > development purposes but don't bring much value for regression testing.
> > > >
> > > > Alejandro Piñeiro (1):
> > > >   intel/compiler/fs: Use half_precision data_format on 16-bit fb writes
> > > >
> > > > Jose Maria Casanova Crespo (2):
> > > >   intel/compiler/fs: Include support for RT data_format bit
> > > >   intel/compiler/disasm: Show half-precision data_format on rt_writes
> > > >
> > > > Topi Pohjolainen (58):
> > > >   intel/compiler/fs: Set 16-bit sampler return format
> > > >   intel/compiler/disasm: Show half-precision for sampler messages
> > > >   intel/compiler/fs: Skip tex-inst early in conversion lowering
> > > >   intel/compiler/fs: Support for dumping 16-bit IMM values
> > > >   intel/compiler: Allow 16-bit math
> > > >   intel/compiler/fs: Add helpers for 16-bit null regs
> > > >   intel/compiler/fs: Use two SIMD8 instructions for 16-bit math
> > > >   intel/compiler/fs: Use 16-bit null dest with 16-bit math
> > > >   intel/compiler/fs: Use 16-bit null dest with 16-bit compare
> > > >   intel/compiler/fs: Add 16-bit type support for nir_if
> > > >   intel/compiler/eu: Prepare 3-src-op for 16-bit sources
> > > >   intel/compiler/eu: Prepare 3-src-op for 16-bit dst
> > > >   intel/compiler/eu: Allow 3-src-op with mixed precision (HF/F) sources
> > > >   intel/compiler/disasm: Print mixed precision 3-src types correctly
> > > >   intel/compiler/disasm: Print 16-bit IMM values
> > > >   intel/compiler/fs: Support for combining 16-bit immediates
> > > >   intel/compiler/fs: Set tex type for generator to flag fp16
> > > >   intel/compiler/fs: Use component_size() instead of open coded
> > > >   intel/compiler/fs: Add register padding support
> > > >   intel/compiler/fs: Pad 16-bit texture return payloads
> > > >   intel/compiler/fs: Pad 16-bit output (store/fb write) payloads
> > > >   intel/compiler/fs: Pad 16-bit nir vec* components into full reg
> > > >   intel/compiler/fs: Pad 16-bit nir intrinsic dest into full reg
> > > >   intel/compiler/fs: Pad 16-bit const loads into full regs
> > > >   intel/compiler/fs: Pad 16-bit load payload lowering
> > > >   nir: Lower also 16-bit lrp() if needed
> > > >   intel/compiler: Lower 16-bit lrp()
> > > >   nir: Recognize f232(f216(x)) as x
> > > >   nir: Recognize f216(f232(x)) as x
> > > >   nir: Store variable precision when translating from glsl
> > > >   glsl: Set default precision for builtin variables
> > > >   i965: Prepare uniform mapping for 16-bit values
> > > >   i965: Support for uploading 16-bit uniforms from 32-bit store
> > > >   intel/compiler/fs: WIP: Use 32-bit slots for 16-bit uniforms
> > > >   intel/compiler: Tell compiler if lower precision is supported
> > > >   nir: Add lowering pass for variables marked mediump
> > > >   nir: Add pass for deref precision lowering
> > > >   nir: Add pass for alu precision lowering
> > > >   nir: Add precision conversion for load/store_deref
> > > >   nir: Add precision conversion for sources of texturing ops
> > > >   nir: Don't set destination size 16 for booleans
> > > >   nir: Add precision lowering for texture samples
> > > >   nir: Add support for non-fixed precision
> > > >   nir: Don't try to alter precision of boolean sources
> > > >   nir: Add support for variable sized booleans
> > > >   nir: Add support for lowering phi precision
> > > >   intel/compiler/fs: Prepare alu dest type for 16-bit booleans
> > > >   nir: Add lowering pass setting 16-bit boolean destinations
> > > >   nir: Add lowering pass turning b2f(i2i32(x)) into b2f(x)
> > > >   nir: Adjust integer precision for alus operating with 16-bit srcs
> > > >   nir: Replace b2f(x) with b2f(i2i32(x)) for 16-bit x
> > > >   nir: Adjust precision for discard_if
> > > >   nir: Allow input varyings to be converted to lower precision
> > > >   nir: Replace 16-bit src[0] for bcsel i2i32(src[0])
> > > >   nir: Replace 16-bit nir_if condition with i2i32(condition)
> > > >   Revert "intel/compiler: fix 16-bit comparisons"
> > > >   intel/compiler: Hook in precision lowering pass
> > > >   nir: Document precision lowering pass
> > > >
> > > >  src/compiler/Makefile.sources                 |   2 +
> > > >  src/compiler/glsl/glsl_symbol_table.cpp       |  20 +
> > > >  src/compiler/glsl/glsl_symbol_table.h         |   7 +
> > > >  src/compiler/glsl/glsl_to_nir.cpp             |   1 +
> > > >  src/compiler/nir/meson.build                  |   2 +
> > > >  src/compiler/nir/nir.h                        |  18 +
> > > >  src/compiler/nir/nir_lower_bool_size.c        | 120 +++
> > > >  src/compiler/nir/nir_lower_precision.cpp      | 820 ++++++++++++++++++
> > > >  src/compiler/nir/nir_opt_algebraic.py         |   5 +
> > > >  src/intel/blorp/blorp.c                       |   4 +-
> > > >  src/intel/compiler/brw_compiler.c             |   1 +
> > > >  src/intel/compiler/brw_disasm.c               |  28 +-
> > > >  src/intel/compiler/brw_eu.h                   |   3 +-
> > > >  src/intel/compiler/brw_eu_emit.c              |  83 +-
> > > >  src/intel/compiler/brw_fs.cpp                 |  68 +-
> > > >  src/intel/compiler/brw_fs.h                   |   4 +-
> > > >  src/intel/compiler/brw_fs_builder.h           |  37 +-
> > > >  .../compiler/brw_fs_combine_constants.cpp     |  84 +-
> > > >  .../compiler/brw_fs_copy_propagation.cpp      |   7 +-
> > > >  src/intel/compiler/brw_fs_generator.cpp       |  13 +-
> > > >  .../compiler/brw_fs_lower_conversions.cpp     |  42 +
> > > >  src/intel/compiler/brw_fs_nir.cpp             | 197 +++--
> > > >  src/intel/compiler/brw_fs_surface_builder.cpp |   3 +-
> > > >  src/intel/compiler/brw_fs_visitor.cpp         |   6 +
> > > >  src/intel/compiler/brw_inst.h                 |   5 +
> > > >  src/intel/compiler/brw_ir_fs.h                |  16 +
> > > >  src/intel/compiler/brw_nir.c                  |  22 +-
> > > >  src/intel/compiler/brw_nir.h                  |   4 +-
> > > >  src/intel/compiler/brw_reg_type.c             |   2 +
> > > >  src/intel/compiler/brw_shader.h               |   7 +
> > > >  src/intel/vulkan/anv_pipeline.c               |   2 +-
> > > >  .../drivers/dri/i965/brw_nir_uniforms.cpp     |   8 +-
> > > >  src/mesa/drivers/dri/i965/brw_program.c       |  10 +-
> > > >  src/mesa/drivers/dri/i965/brw_program.h       |   6 +-
> > > >  src/mesa/drivers/dri/i965/brw_tcs.c           |   2 +-
> > > >  .../drivers/dri/i965/gen6_constant_state.c    |  14 +-
> > > >  36 files changed, 1548 insertions(+), 125 deletions(-)
> > > >  create mode 100644 src/compiler/nir/nir_lower_bool_size.c
> > > >  create mode 100644 src/compiler/nir/nir_lower_precision.cpp
> > > >
> > > > --
> > > > 2.17.1
> > > >
> > > > _______________________________________________
> > > > mesa-dev mailing list
> > > > mesa-dev at lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev