[Mesa-dev] [PATCH 00/59] intel: VK_KHR_shader_float16_int8 implementation

Tue Dec 4 07:16:24 UTC 2018

Hi,

this series implements support for VK_KHR_shader_float16_int8 for Intel
platforms (Broadwell and later). This extension enables Vulkan applications
to consume SPIR-V shaders that use Float16 and Int8 types in shader code,
extending the functionality included with VK_KHR_16bit_storage and
VK_KHR_8bit_storage, which was limited to load/store operations.

A branch with this series is available for testing in the
itoral/VK_KHR_shader_float16_int8 branch of the Igalia Mesa repository at
https://github.com/Igalia/mesa.

On the front-end side, since the implementation targets Vulkan specifically,
it focuses mostly on the SPIR-V to NIR translator, with just a few patches
targetting NIR specifically, but most of the series targets the Intel backend
compiler.

I had sent some early code for a small part of this series a few months ago,
so some of the patches already have a Reviewed-by by Jason, but most of the
patches are either new or still unreviewed.

While the implementation targets Vulkan, the compiler backend bits are not
tied to Vulkan and should be useful for an implementation of GLSL ES mediump
using 16-bit floating point.

Following is a quick summary of the series:

- Patches 1-9 add 16-bit support to a bunch of NIR lowerings for trigonometric
and exponential builtin functions.

- Patches 10-41 implement the float16 part in the compiler backend.

- Patches 42-52 implement the int8 part in the compiler backend.

- Patches 53 to the end of the series are fixes relevant to some optimization
passes to handle properly non 32-bit cases.

This series does not include the boolean lowering to native bit-sizes yet,
since that depends on another series from Jason that hasn't landed yet. My
plan is to send that for review separately, or maybe include it to v2 of
this series at a later time.

The testing for this comes from Khronos CTS, which is okay in the sense
that it is fairly comprehensive in terms of the operations that it tests, and
even their relative precisions, but not great in the sense that this focuses
mostly on standalone tests of basic ALU operations. This means that the tests
do not usually exercise the optimizer and other parts of the compiler in the
same way that applications would, which is a coverage hole that is difficult
to address in CTS I guess. Ideally, the situation will get better once we get
an implementation of mediump, since that should hopefully enable us to test a
lot of this work with existing GLES applications. For what is worth, I am still
reviewing the compiler to identify things that need to be addressed that are not
triggered by the CTS, some of the patches in the tail of the series come from
this work and I plan to adress a few more in the coming weeks.

Review feedback is welcome!

Iago

Iago Toral Quiroga (58):
  compiler/spirv: handle 16-bit float in radians() and degrees()
  compiler/spirv: implement 16-bit asin
  compiler/spirv: implement 16-bit acos
  compiler/spirv: implement 16-bit atan
  compiler/spirv: implement 16-bit atan2
  compiler/spirv: implement 16-bit exp and log
  compiler/spirv: implement 16-bit hyperbolic trigonometric functions
  compiler/spirv: implement 16-bit frexp
  compiler/spirv: use 32-bit polynomial approximation for 16-bit asin()
  intel/compiler: implement conversions from 16-bit float to 64-bit
  intel/compiler: handle b2i/b2f with other integer conversion opcodes
  intel/compiler: simplify f2*64 opcodes
  intel/compiler: lower some 16-bit float operations to 32-bit
  intel/compiler: lower 16-bit extended math to 32-bit prior to gen9
  intel/compiler: implement 16-bit fsign
  intel/compiler: allow extended math functions with HF operands
  compiler/nir: add lowering option for 16-bit fmod
  intel/compiler: lower 16-bit fmod
  compiler/nir: add lowering for 16-bit flrp
  intel/compiler: lower 16-bit flrp
  compiler/nir: add lowering for 16-bit ldexp
  intel/compiler: Extended Math is limited to SIMD8 on half-float
  intel/compiler: add instruction setters for Src1Type and Src2Type.
  intel/compiler: add new half-float register type for 3-src
    instructions
  intel/compiler: don't compact 3-src instructions with Src1Type or
    Src2Type bits
  intel/compiler: allow half-float on 3-source instructions since gen8
  intel/compiler: set correct precision fields for 3-source float
    instructions
  intel/compiler: don't propagate HF immediates to 3-src instructions
  intel/compiler: document MAD algebraic optimization
  intel/compiler: fix ddx and ddy for 16-bit float
  intel/compiler: fix 16-bit float ddx and ddy for SIMD8
  intel/compiler: do not copy-propagate strided regions to ddx/ddy
    arguments
  intel/compiler: fix ddy for half-float in gen8
  intel/compiler: workaround for SIMD8 half-float MAD in gen < 9
  compiler/spirv: add implementation to check for SpvCapabilityFloat16
    support
  anv/pipeline: support SpvCapabilityFloat16 in gen8+
  vulkan: import Khronos header and xml version 95
  anv/device: expose support for shaderFloat16 in gen8+
  anv/extensions: expose VK_KHR_shader_float16_int8 on gen8+
  intel/compiler: split is_partial_write() into two variants
  intel/compiler: activate 16-bit bit-size lowerings also for 8-bit
  intel/compiler: fix conversions from 64-bit to 8-bit int
  intel/compiler: implement conversions from 8-bit int to 64-bit
  intel/compiler: implement conversions from 16-bit float to 8-bit int
  intel/compiler: fix integer to/from half-float in atom platforms
  intel/compiler: assert that lower conversions produces valid strides
  intel/compiler: implement isign for int8
  intel/eu: force stride of 2 on NULL register for Byte instructions
  compiler/spirv: add implementation to check for SpvCapabilityInt8
  anv/pipeline: support SpvCapabilityInt8 in gen8+
  anv/device: expose shaderInt8 feature
  intel/compiler: implement is_zero, is_one, is_negative_one for
    8-bit/16-bit
  intel/compiler: add a brw_reg_type_is_integer helper
  intel/compiler: fix cmod propagation for non 32-bit types
  intel/compiler: support half-float in the combine constants pass
  intel/compiler: fix combine constants for Align16 with half-float
    prior to gen9
  intel/compiler: implement MAD algebraic optimizations on half-float
  intel/compiler: allow propagating HF immediates to MAD/LRP

Samuel Iglesias Gonsálvez (1):
  intel/compiler: Implement float64/int64 to float16 conversion

 include/vulkan/vulkan_core.h                  | 109 ++++++++-
 src/compiler/nir/nir.h                        |   2 +
 src/compiler/nir/nir_builtin_builder.h        |   8 +-
 src/compiler/nir/nir_opt_algebraic.py         |   7 +
 src/compiler/shader_info.h                    |   2 +
 src/compiler/spirv/spirv_to_nir.c             |   8 +-
 src/compiler/spirv/vtn_glsl450.c              | 183 ++++++++++----
 src/intel/compiler/brw_compiler.c             |   2 +
 src/intel/compiler/brw_eu_compact.c           |   5 +-
 src/intel/compiler/brw_eu_emit.c              |  25 +-
 src/intel/compiler/brw_fs.cpp                 | 161 +++++++++++--
 src/intel/compiler/brw_fs.h                   |   1 +
 .../compiler/brw_fs_cmod_propagation.cpp      |  28 +--
 .../compiler/brw_fs_combine_constants.cpp     |  82 ++++++-
 .../compiler/brw_fs_copy_propagation.cpp      |  36 ++-
 src/intel/compiler/brw_fs_cse.cpp             |   3 +-
 .../compiler/brw_fs_dead_code_eliminate.cpp   |   2 +-
 src/intel/compiler/brw_fs_generator.cpp       |  44 ++--
 src/intel/compiler/brw_fs_live_variables.cpp  |   2 +-
 .../compiler/brw_fs_lower_conversions.cpp     |   7 +
 src/intel/compiler/brw_fs_nir.cpp             | 224 ++++++++++++++++--
 src/intel/compiler/brw_fs_reg_allocate.cpp    |   2 +-
 .../compiler/brw_fs_register_coalesce.cpp     |   2 +-
 .../compiler/brw_fs_saturate_propagation.cpp  |   7 +-
 src/intel/compiler/brw_fs_sel_peephole.cpp    |   4 +-
 src/intel/compiler/brw_inst.h                 |   2 +
 src/intel/compiler/brw_ir_fs.h                |   3 +-
 src/intel/compiler/brw_nir.c                  |  20 +-
 src/intel/compiler/brw_reg_type.c             |  35 ++-
 src/intel/compiler/brw_reg_type.h             |  18 ++
 src/intel/compiler/brw_shader.cpp             |  20 ++
 src/intel/vulkan/anv_device.c                 |   9 +
 src/intel/vulkan/anv_extensions.py            |   1 +
 src/intel/vulkan/anv_pipeline.c               |   2 +
 src/vulkan/registry/vk.xml                    | 130 +++++++---
 35 files changed, 1005 insertions(+), 191 deletions(-)

-- 
2.17.1