[Mesa-dev] intel: WIP: Support for using 16-bits for mediump
Iago Toral
itoral at igalia.com
Tue Nov 6 08:40:17 UTC 2018
On Tue, 2018-11-06 at 08:30 +0200, Topi Pohjolainen wrote:
> Here is a version 2 of adding support for 16-bit float instructions
> in
> the shader compiler. Unlike the first version which did all the
> analysis
> at glsl level here one adds the notion of precision to NIR variables
> and
> does the analysis and precision lowering in NIR level.
>
> This lives in: gitlab.freedesktop.org:tpohjola/mesa and branch fp16.
>
> This is now mature enough to be able to use 16-bit precision for all
> instructions except a few special cases for gfxbench trex and alu2.
> (Unfortunately I'm not seeing any performance benefit.
That is not too surprising. The backend optimizer has been implemented
in terms of 32-bit and you are probably losing a lot of optimizations
in the generated code for 16-bit paths. I have hit some of that as well
while working on the backend aspects of enabling 16-bit. For example,
SIMD8 executions (which is all of the geometry pipeline) will not
benefit from copy-propagation because is_partial_write() is always true
for SIMD8 16-bit instructions with its current semantics. There are
other optimization passes that have hard-coded 32-bit conditions, etc I
have addressed a small part of this and have some code available that I
expect to send for review soon, but there is clearly work to be done in
the backend to optimize things for 16-bit paths, which I hope to work
on in the near future.
> This is not
> that surprising as I got to the same point with the glsl-based
> solution and was able to measure the performance already back then).
> Hence I thought it is time to share it.
>
> While this is still work-in-progress I didn't want to flood the list
> with the full set of patches but instead included the very last where
> I try to outline the logic and its current shortcomings. There is
> also
> a short list of TODO items.
>
> In addition to those I need to examine couple of Intel specific
> misrenderings. I haven't gotten that deep yet but it looks I'm
> missing
> something with 16-bit inot and mad/mac lowered interpolation.
> Unfortunately I get corrupted rendering only with hardware while
> simulator is happy.
Are you implementing interpolation of 16-bit fragment shader inputs? I
have discussed that with Jason in the past and based on my
experimentation, I think the hardware doesn't support this natively:
the interpolator seems to produce 32-bit deltas only and assumes 32-bit
inputs only as well.
> Mostly I'm afraid how to test all of this properly. I haven't written
> any unit tests but that is high on my list. This is mostly because
> I've
> been uncertain about my design choices. So far I've used shader
> runner tests that I've written for specific cases. These are useful
> for
> development purposes but don't bring much value for regression
> testing.
Have you tried dEQP / GLES CTS yet? I figure there should be a lot of
mediump shaders there.
Another note on 16-bit booleans, since I see you've been working on
that, I don't know if you're aware that Jason has posted relevant
patches here:
https://lists.freedesktop.org/archives/mesa-dev/2018-October/207458.html
This basically introduced the notion of bit-sized booleans in NIR, and
it leaves it up to the backend to lower booleans to the bit-size they
need before translating to a backend IR. I have been working on that
lowering and have a prototype working for 16-bit booleans built on top
of Jason's series and my backend work for half-float. Let me know if
you are interested and I'll point you to the code.
Iago
> Alejandro PiƱeiro (1):
> intel/compiler/fs: Use half_precision data_format on 16-bit fb
> writes
>
> Jose Maria Casanova Crespo (2):
> intel/compiler/fs: Include support for RT data_format bit
> intel/compiler/disasm: Show half-precision data_format on rt_writes
>
> Topi Pohjolainen (58):
> intel/compiler/fs: Set 16-bit sampler return format
> intel/compiler/disasm: Show half-precision for sampler messages
> intel/compiler/fs: Skip tex-inst early in conversion lowering
> intel/compiler/fs: Support for dumping 16-bit IMM values
> intel/compiler: Allow 16-bit math
> intel/compiler/fs: Add helpers for 16-bit null regs
> intel/compiler/fs: Use two SIMD8 instructions for 16-bit math
> intel/compiler/fs: Use 16-bit null dest with 16-bit math
> intel/compiler/fs: Use 16-bit null dest with 16-bit compare
> intel/compiler/fs: Add 16-bit type support for nir_if
> intel/compiler/eu: Prepare 3-src-op for 16-bit sources
> intel/compiler/eu: Prepare 3-src-op for 16-bit dst
> intel/compiler/eu: Allow 3-src-op with mixed precision (HF/F)
> sources
> intel/compiler/disasm: Print mixed precision 3-src types correctly
> intel/compiler/disasm: Print 16-bit IMM values
> intel/compiler/fs: Support for combining 16-bit immediates
> intel/compiler/fs: Set tex type for generator to flag fp16
> intel/compiler/fs: Use component_size() instead of open coded
> intel/compiler/fs: Add register padding support
> intel/compiler/fs: Pad 16-bit texture return payloads
> intel/compiler/fs: Pad 16-bit output (store/fb write) payloads
> intel/compiler/fs: Pad 16-bit nir vec* components into full reg
> intel/compiler/fs: Pad 16-bit nir intrinsic dest into full reg
> intel/compiler/fs: Pad 16-bit const loads into full regs
> intel/compiler/fs: Pad 16-bit load payload lowering
> nir: Lower also 16-bit lrp() if needed
> intel/compiler: Lower 16-bit lrp()
> nir: Recognize f232(f216(x)) as x
> nir: Recognize f216(f232(x)) as x
> nir: Store variable precision when translating from glsl
> glsl: Set default precision for builtin variables
> i965: Prepare uniform mapping for 16-bit values
> i965: Support for uploading 16-bit uniforms from 32-bit store
> intel/compiler/fs: WIP: Use 32-bit slots for 16-bit uniforms
> intel/compiler: Tell compiler if lower precision is supported
> nir: Add lowering pass for variables marked mediump
> nir: Add pass for deref precision lowering
> nir: Add pass for alu precision lowering
> nir: Add precision conversion for load/store_deref
> nir: Add precision conversion for sources of texturing ops
> nir: Don't set destination size 16 for booleans
> nir: Add precision lowering for texture samples
> nir: Add support for non-fixed precision
> nir: Don't try to alter precision of boolean sources
> nir: Add support for variable sized booleans
> nir: Add support for lowering phi precision
> intel/compiler/fs: Prepare alu dest type for 16-bit booleans
> nir: Add lowering pass setting 16-bit boolean destinations
> nir: Add lowering pass turning b2f(i2i32(x)) into b2f(x)
> nir: Adjust integer precision for alus operating with 16-bit srcs
> nir: Replace b2f(x) with b2f(i2i32(x)) for 16-bit x
> nir: Adjust precision for discard_if
> nir: Allow input varyings to be converted to lower precision
> nir: Replace 16-bit src[0] for bcsel i2i32(src[0])
> nir: Replace 16-bit nir_if condition with i2i32(condition)
> Revert "intel/compiler: fix 16-bit comparisons"
> intel/compiler: Hook in precision lowering pass
> nir: Document precision lowering pass
>
> src/compiler/Makefile.sources | 2 +
> src/compiler/glsl/glsl_symbol_table.cpp | 20 +
> src/compiler/glsl/glsl_symbol_table.h | 7 +
> src/compiler/glsl/glsl_to_nir.cpp | 1 +
> src/compiler/nir/meson.build | 2 +
> src/compiler/nir/nir.h | 18 +
> src/compiler/nir/nir_lower_bool_size.c | 120 +++
> src/compiler/nir/nir_lower_precision.cpp | 820
> ++++++++++++++++++
> src/compiler/nir/nir_opt_algebraic.py | 5 +
> src/intel/blorp/blorp.c | 4 +-
> src/intel/compiler/brw_compiler.c | 1 +
> src/intel/compiler/brw_disasm.c | 28 +-
> src/intel/compiler/brw_eu.h | 3 +-
> src/intel/compiler/brw_eu_emit.c | 83 +-
> src/intel/compiler/brw_fs.cpp | 68 +-
> src/intel/compiler/brw_fs.h | 4 +-
> src/intel/compiler/brw_fs_builder.h | 37 +-
> .../compiler/brw_fs_combine_constants.cpp | 84 +-
> .../compiler/brw_fs_copy_propagation.cpp | 7 +-
> src/intel/compiler/brw_fs_generator.cpp | 13 +-
> .../compiler/brw_fs_lower_conversions.cpp | 42 +
> src/intel/compiler/brw_fs_nir.cpp | 197 +++--
> src/intel/compiler/brw_fs_surface_builder.cpp | 3 +-
> src/intel/compiler/brw_fs_visitor.cpp | 6 +
> src/intel/compiler/brw_inst.h | 5 +
> src/intel/compiler/brw_ir_fs.h | 16 +
> src/intel/compiler/brw_nir.c | 22 +-
> src/intel/compiler/brw_nir.h | 4 +-
> src/intel/compiler/brw_reg_type.c | 2 +
> src/intel/compiler/brw_shader.h | 7 +
> src/intel/vulkan/anv_pipeline.c | 2 +-
> .../drivers/dri/i965/brw_nir_uniforms.cpp | 8 +-
> src/mesa/drivers/dri/i965/brw_program.c | 10 +-
> src/mesa/drivers/dri/i965/brw_program.h | 6 +-
> src/mesa/drivers/dri/i965/brw_tcs.c | 2 +-
> .../drivers/dri/i965/gen6_constant_state.c | 14 +-
> 36 files changed, 1548 insertions(+), 125 deletions(-)
> create mode 100644 src/compiler/nir/nir_lower_bool_size.c
> create mode 100644 src/compiler/nir/nir_lower_precision.cpp
>
More information about the mesa-dev
mailing list