[Mesa-dev] [PATCH v2 00/53] intel: VK_KHR_shader_float16_int8 implementation

Wed Dec 19 11:50:28 UTC 2018

This version rebases the series on top of a more recent master and addresses
review feedback to v1.

The main change is the rewrite of the type conversion patches to reduce the
growing complexity of the backend following discussions with Jason. The main
actions I took in the end are:

1) Moved the code that handled conversion splitting into 2 conversions through
  an intermediary type to a NIR pass.
2) Added helpers to handle special conversion restrictions in the backend

Moving the splitting cases to NIR was particularly useful to reduce complexity,
since most of that complexity came from the interactions that these splitting
cases had with other restrictions.

I was also going to move the conversion code to a separate helper function, but
I think the above steps reduced the complexity to a point where this is no longer
needed. With that, said, I am happy to do this if there is still interest. For a
quick look, here is what the end result looks like:
https://github.com/Igalia/mesa/blob/itoral/VK_KHR_shader_float16_int8/src/intel/compiler/brw_fs_nir.cpp#L825

Another idea suggested for this was to just emit the conversions and then apply
the fixes during the lower_conversions pass that we run right before codegen.
I didn't try this in the end since it didn't look necessary any more, but again,
I am happy to try this if there is still interest.

The other relevant review feedback was to add new nir_{fadd,fmul}_imm helpers
and use them in the initial patches of the series. I did this for everything
except for a couple of cases where we have 'imm - expr', which we
would need to implement as neg(fadd_imm(expr, -imm)) which would add an extra
negate and didn't seem worth it.

The other relevant change is that in v1 I included support for half-float MAD/LRP
algebraic optimizations in the backend, but after some shader-db testing and 
discussion with Jason we concluded that we should actually remove these
from the backend (including 32-bit paths), so this series does this as well
(this is towards the end of the series).

A branch with this series is available for testing in the
itoral/VK_KHR_shader_float16_int8 branch of the Igalia Mesa repository at
https://github.com/Igalia/mesa.

Iago Toral Quiroga (53):
  compiler/nir: add a nir_b2f() helper
  compiler/nir: add nir_fadd_imm() and nir_fadd_imm() helpers
  compiler/spirv: handle 16-bit float in radians() and degrees()
  compiler/spirv: implement 16-bit asin
  compiler/spirv: implement 16-bit acos
  compiler/spirv: implement 16-bit atan
  compiler/spirv: implement 16-bit atan2
  compiler/spirv: implement 16-bit exp and log
  compiler/spirv: implement 16-bit hyperbolic trigonometric functions
  compiler/spirv: implement 16-bit frexp
  compiler/spirv: use 32-bit polynomial approximation for 16-bit asin()
  intel/compiler: add a NIR pass to lower conversions
  intel/compiler: add a helper to handle conversions to 64-bit in atom
  intel/compiler: split float to 64-bit opcodes from int to 64-bit
  intel/compiler: handle b2i/b2f with other integer conversion opcodes
  intel/compiler: handle conversions to half-float
  intel/compiler: lower some 16-bit float operations to 32-bit
  intel/compiler: lower 16-bit extended math to 32-bit prior to gen9
  intel/compiler: implement 16-bit fsign
  intel/compiler: allow extended math functions with HF operands
  compiler/nir: add lowering option for 16-bit fmod
  intel/compiler: lower 16-bit fmod
  compiler/nir: add lowering for 16-bit flrp
  intel/compiler: lower 16-bit flrp
  compiler/nir: add lowering for 16-bit ldexp
  intel/compiler: Extended Math is limited to SIMD8 on half-float
  intel/compiler: add instruction setters for Src1Type and Src2Type.
  intel/compiler: add new half-float register type for 3-src
    instructions
  intel/compiler: don't compact 3-src instructions with Src1Type or
    Src2Type bits
  intel/compiler: allow half-float on 3-source instructions since gen8
  intel/compiler: set correct precision fields for 3-source float
    instructions
  intel/compiler: don't propagate HF immediates to 3-src instructions
  intel/compiler: fix ddx and ddy for 16-bit float
  intel/compiler: fix ddy for half-float in gen8
  intel/compiler: workaround for SIMD8 half-float MAD in gen < 9
  intel/compiler: split is_partial_write() into two variants
  intel/compiler: activate 16-bit bit-size lowerings also for 8-bit
  intel/compiler: handle 64-bit to 8-bit conversions
  intel/compiler: add a helper to do conversions between integer and
    half-float
  intel/compiler: handle conversions between int and half-float on atom
  intel/compiler: assert that lower conversions produces valid strides
  intel/compiler: implement isign for int8
  intel/eu: force stride of 2 on NULL register for Byte instructions
  compiler/spirv: add support for Float16 and Int8 capabilities
  anv/pipeline: support Float16 and Int8 capabilities in gen8+
  anv/device: expose shaderFloat16 and shaderInt8 in gen8+
  intel/compiler: implement is_zero, is_one, is_negative_one for
    8-bit/16-bit
  intel/compiler: add a brw_reg_type_is_integer helper
  intel/compiler: fix cmod propagation for non 32-bit types
  intel/compiler: remove MAD/LRP algebraic optimizations from the
    backend
  intel/compiler: support half-float in the combine constants pass
  intel/compiler: fix combine constants for Align16 with half-float
    prior to gen9
  intel/compiler: allow propagating HF immediates to MAD/LRP

 src/compiler/nir/nir.h                        |   2 +
 src/compiler/nir/nir_builder.h                |  24 +++
 src/compiler/nir/nir_builtin_builder.h        |   4 +-
 src/compiler/nir/nir_opt_algebraic.py         |  11 +-
 src/compiler/shader_info.h                    |   2 +
 src/compiler/spirv/spirv_to_nir.c             |   8 +-
 src/compiler/spirv/vtn_glsl450.c              | 179 +++++++++++++-----
 src/intel/Makefile.sources                    |   1 +
 src/intel/compiler/brw_compiler.c             |   2 +
 src/intel/compiler/brw_eu_compact.c           |   5 +-
 src/intel/compiler/brw_eu_emit.c              |  36 +++-
 src/intel/compiler/brw_fs.cpp                 | 145 +++++++++-----
 src/intel/compiler/brw_fs.h                   |   1 +
 .../compiler/brw_fs_cmod_propagation.cpp      |  28 +--
 .../compiler/brw_fs_combine_constants.cpp     |  82 ++++++--
 .../compiler/brw_fs_copy_propagation.cpp      |  14 +-
 src/intel/compiler/brw_fs_cse.cpp             |   3 +-
 .../compiler/brw_fs_dead_code_eliminate.cpp   |   2 +-
 src/intel/compiler/brw_fs_generator.cpp       |  47 +++--
 src/intel/compiler/brw_fs_live_variables.cpp  |   2 +-
 .../compiler/brw_fs_lower_conversions.cpp     |   7 +
 src/intel/compiler/brw_fs_nir.cpp             | 166 ++++++++++++----
 src/intel/compiler/brw_fs_reg_allocate.cpp    |   2 +-
 .../compiler/brw_fs_register_coalesce.cpp     |   2 +-
 .../compiler/brw_fs_saturate_propagation.cpp  |   7 +-
 src/intel/compiler/brw_fs_sel_peephole.cpp    |   4 +-
 src/intel/compiler/brw_inst.h                 |   2 +
 src/intel/compiler/brw_ir_fs.h                |   3 +-
 src/intel/compiler/brw_nir.c                  |  22 ++-
 src/intel/compiler/brw_nir.h                  |   2 +
 .../compiler/brw_nir_lower_conversions.c      | 138 ++++++++++++++
 src/intel/compiler/brw_reg_type.c             |  35 +++-
 src/intel/compiler/brw_reg_type.h             |  18 ++
 src/intel/compiler/brw_shader.cpp             |  26 +++
 src/intel/compiler/meson.build                |   1 +
 src/intel/vulkan/anv_device.c                 |   9 +
 src/intel/vulkan/anv_extensions.py            |   1 +
 src/intel/vulkan/anv_pipeline.c               |   2 +
 38 files changed, 827 insertions(+), 218 deletions(-)
 create mode 100644 src/intel/compiler/brw_nir_lower_conversions.c

-- 
2.17.1