[Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage

Mon Feb 18 08:55:57 UTC 2019

On 2/16/19 1:21 AM, Rhys Perry wrote:
> This series add support for:
> - VK_KHR_shader_float16_int8
> - VK_AMD_gpu_shader_half_float
> - VK_AMD_gpu_shader_int16
> - VK_KHR_8bit_storage
> on VI+. Half floats are disabled on LLVM 7 because of a bug causing large
> memory usage and long (or unbounded) compilation times with some CTS
> tests.
>
> It is written against the following patch series:
> - https://patchwork.freedesktop.org/series/53454/ (v4)
> - https://patchwork.freedesktop.org/series/53660/ (v1)
>
> With LLVM 9, there are no reproducable Vulkan CTS regressions with Vega
> and VI except for
> dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_float_64_to_16.*
> which fails or crashes because of unrelated radv bugs with 64-bit varyings
> and because the tests use VK_FORMAT_R64_SFLOAT as a vertex format even
> though radv does not support it.

test bug?

The two NIR related patches (22 and 25) should be sent separately, 
otherwise people working on NIR might miss them.

>
> With LLVM 9, there are no reproducable piglit regressions except for
> glsl-array-bounds-12.shader_test because of a LLVM bug when
> SLP vectorization is enabled.
>
> With LLVM 8, there are no reproducable Vulkan CTS regressions with Vega
> and VI except for those with LLVM 9 and a couple of tests because of a
> LLVM bug after the SLP vectorizer and with the current lack of fallback
> for 16-bit interpolation on LLVM versions before LLVM 9.
>
> With LLVM 7, there are no reproducable Vulkan CTS regressions with Vega
> and VI except for those with LLVM 9 and a couple of tests because of a
> LLVM bug after the SLP vectorizer.
>
> The SLP vectorization patch is marked as WIP because it exposes LLVM bugs
> with piglit's glsl-array-bounds-12.shader_test, some Vulkan CTS tests and
> some shader-db test for a game I can't remember. It also over-vectorizes
> 32-bit code which can cause significant worsening in generated code
> quality.
>
> The 16-bit interpolation patch is marked as WIP because it currently
> requires intrinsics only available in LLVM 9 and does not have a fallback.
>
> A branch on Github containing this series can be found at:
> https://github.com/pendingchaos/mesa/commits/radv_fp16_int16_int8_v2
>
> v2: rebase
> v2: implement 16-bit interpolation
> v2: move LLVMAddSLPVectorizePass to after LLVMAddEarlyCSEMemSSAPass
> v2: run vectorization unconditionally on GFX9 and later
> v2: remove ac_get_one(), ac_get_zero(), ac_get_onef() and ac_get_zerof()
> v2: remove ac_int_of_size()
> v2: fix 64-bit visit_load_var()
> v2: mark VK_KHR_8bit_storage as DONE in features.txt
> v2: mark SLP vectorization patch as WIP
> v2: fix C++ style comment
>
> Rhys Perry (41):
>    radv: bitcast 16-bit outputs to integers
>    radv: ensure export arguments are always float
>    ac: add various helpers for float16/int16/int8
>    ac/nir: implement 8-bit push constant, ssbo and ubo loads
>    ac/nir: implement 8-bit ssbo stores
>    ac/nir: fix 16-bit ssbo stores
>    ac/nir: implement 8-bit nir_load_const_instr
>    ac/nir: implement 8-bit conversions
>    ac/nir: fix 64-bit nir_op_f2f16_rtz
>    ac/nir: make ac_build_clamp work on all bit sizes
>    ac/nir: make ac_build_fract work on all bit sizes
>    ac/nir: make ac_build_isign work on all bit sizes
>    ac/nir: make ac_build_fsign work on all bit sizes
>    ac/nir: make ac_build_fdiv support 16-bit floats
>    ac/nir: implement half-float nir_op_frcp
>    ac/nir: implement half-float nir_op_frsq
>    ac/nir: implement half-float nir_op_ldexp
>    radv: lower 16-bit flrp
>    ac/nir: support half floats in emit_b2f
>    ac/nir: make emit_b2i work on all bit sizes
>    ac/nir: implement 16-bit shifts
>    compiler/nir: add lowering option for 16-bit ffma
>    ac/nir: implement 16-bit ac_build_ddxy
>    ac/nir: implement 8 and 16 bit ac_build_readlane
>    nir: make bitfield_reverse and ifind_msb work with all integers
>    ac/nir: make ac_find_lsb work on all bit sizes
>    ac/nir: make ac_build_umsb work on all bit sizes
>    ac/nir: implement 8 and 16 bit ac_build_imsb
>    ac/nir: make ac_build_bit_count work on all bit sizes
>    ac/nir: make ac_build_bitfield_reverse work on all bit sizes
>    ac/nir: implement 16-bit pack/unpack opcodes
>    ac/nir: add 8-bit types to glsl_base_to_llvm_type
>    ac/nir,radv: create an array of varying output types
>    ac/nir: store all outputs as f32
>    radv: store all fragment shader inputs as f32
>    radv: handle all fragment output types
>    WIP: radv,ac: implement 16-bit interpolation
>    WIP: ac,radv: run LLVM's SLP vectorizer
>    ac/nir: generate better code for nir_op_f2f16_rtz
>    ac/nir: have nir_op_f2f16 round to zero
>    radv,docs: expose float16, int16 and int8 features and extensions
>
>   docs/features.txt                        |   2 +-
>   src/amd/common/ac_llvm_build.c           | 325 +++++++++++------------
>   src/amd/common/ac_llvm_build.h           |  18 +-
>   src/amd/common/ac_llvm_util.c            |   8 +-
>   src/amd/common/ac_nir_to_llvm.c          | 268 +++++++++++++++----
>   src/amd/common/ac_shader_abi.h           |   1 +
>   src/amd/vulkan/radv_device.c             |  17 ++
>   src/amd/vulkan/radv_extensions.py        |   4 +
>   src/amd/vulkan/radv_nir_to_llvm.c        | 123 +++++----
>   src/amd/vulkan/radv_pipeline.c           |  19 +-
>   src/amd/vulkan/radv_shader.c             |   4 +
>   src/amd/vulkan/radv_shader.h             |   1 +
>   src/broadcom/compiler/nir_to_vir.c       |   1 +
>   src/compiler/nir/nir.h                   |   1 +
>   src/compiler/nir/nir_opcodes.py          |   4 +-
>   src/compiler/nir/nir_opt_algebraic.py    |   4 +-
>   src/gallium/drivers/radeonsi/si_get.c    |   1 +
>   src/gallium/drivers/radeonsi/si_shader.c |   2 +-
>   src/gallium/drivers/vc4/vc4_program.c    |   1 +
>   19 files changed, 507 insertions(+), 297 deletions(-)
>