[Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage
Samuel Pitoiset
samuel.pitoiset at gmail.com
Wed Feb 13 21:22:43 UTC 2019
On 2/13/19 9:20 PM, Rhys Perry wrote:
> Quite a bit of the patches aren't specific to a single extension as
> many make code size-generic and some of the extensions intersect in
> functionality.
> It might still be possible to roughly order the patches by
> functionality but I'm not sure if it would be very useful (possible
> order in attachment). I didn't look at the actual content of the
> patches when creating the attachment, this is from memory and looking
> at the descriptions.
> Would you like me to send out a v2 of this series doing like that?
Ok. No that's fine.
Can you rebase and handle Marek feedbacks, at least? I will review the v2.
Thanks Rhys.
>
> On Tue, 12 Feb 2019 at 17:08, Samuel Pitoiset <samuel.pitoiset at gmail.com> wrote:
>> How about splitting this series in four different parts? One for every
>> extension? Is this doable without too much troubles?
>>
>> On 2/12/19 6:02 PM, Rhys Perry wrote:
>>> It currently requires review (and possibly rebasing). Marek Olšák send
>>> some feedback for a few of the patches but other than that, it hasn't
>>> gotten much attention.
>>>
>>> Also patch 35 seems to vectorize 32-bit code which can help or hurt
>>> shaders quite a bit and seems to hurt shaders overall. I'm not yet
>>> sure how to solve this without removing it or changing the result of
>>> LLVM's SLP vectorizer significantly.
>>> IIRC enabling SLP vectorizer also uncovered a RA bug with a shader.
>>>
>>> I think I'll look into the issues with patch 35 again.
>>>
>>> On Tue, 12 Feb 2019 at 16:30, Samuel Pitoiset <samuel.pitoiset at gmail.com> wrote:
>>>> What's the status of this?
>>>>
>>>> On 12/7/18 6:21 PM, Rhys Perry wrote:
>>>>> This series add support for:
>>>>> - VK_KHR_shader_float16_int8
>>>>> - VK_AMD_gpu_shader_half_float
>>>>> - VK_AMD_gpu_shader_int16
>>>>> - VK_KHR_8bit_storage
>>>>> on VI+. Half floats are currently disabled on LLVM 7 because of a bug
>>>>> causing large memory usage and long (or unbounded) compilation times with
>>>>> some tests.
>>>>>
>>>>> It depends on the follow patch series:
>>>>> - https://patchwork.freedesktop.org/series/53454/
>>>>> - https://patchwork.freedesktop.org/series/53602/
>>>>> - https://patchwork.freedesktop.org/series/53660/
>>>>>
>>>>> An older version was tested on my Polaris card, but due to hardware issues
>>>>> I currently can't test the latest version of the series.
>>>>>
>>>>> deqp-vk has no regressions and none of the newly enabled tests fail.
>>>>>
>>>>> Rhys Perry (38):
>>>>> ac: add various helpers for float16/int16/int8
>>>>> ac/nir: implement 8-bit push constant, ssbo and ubo loads
>>>>> ac/nir: implement 8-bit ssbo stores
>>>>> ac/nir: fix 16-bit ssbo stores
>>>>> ac/nir: implement 8-bit nir_load_const_instr
>>>>> ac/nir: implement 8-bit conversions
>>>>> ac/nir: fix 64-bit nir_op_f2f16_rtz
>>>>> ac/nir: make ac_build_clamp work on all bit sizes
>>>>> ac/nir: make ac_build_fract work on all bit sizes
>>>>> ac/nir: make ac_build_isign work on all bit sizes
>>>>> ac/nir: make ac_build_fsign work on all bit sizes
>>>>> ac/nir: make ac_build_fdiv support 16-bit floats
>>>>> ac/nir: implement half-float nir_op_frcp
>>>>> ac/nir: implement half-float nir_op_frsq
>>>>> ac/nir: implement half-float nir_op_ldexp
>>>>> radv: lower 16-bit flrp
>>>>> ac/nir: support half floats in emit_b2f
>>>>> ac/nir: make emit_b2i work on all bit sizes
>>>>> ac/nir: implement 16-bit shifts
>>>>> compiler/nir: add lowering option for 16-bit ffma
>>>>> ac/nir: implement 16-bit ac_build_ddxy
>>>>> ac/nir: implement 8 and 16 bit ac_build_readlane
>>>>> nir: make bitfield_reverse and ifind_msb work with all integers
>>>>> ac/nir: make ac_find_lsb work on all bit sizes
>>>>> ac/nir: make ac_build_umsb work on all bit sizes
>>>>> ac/nir: implement 8 and 16 bit ac_build_imsb
>>>>> ac/nir: make ac_build_bit_count work on all bit sizes
>>>>> ac/nir: make ac_build_bitfield_reverse work on all bit sizes
>>>>> ac/nir: implement 16-bit pack/unpack opcodes
>>>>> ac/nir: add 8-bit and 16-bit types to glsl_base_to_llvm_type
>>>>> ac/nir,radv: create an array of varying output types
>>>>> ac/nir: store all outputs as f32
>>>>> radv: store all fragment shader inputs as f32
>>>>> radv: handle all fragment output types
>>>>> ac,radv: run LLVM's SLP vectorizer
>>>>> ac/nir: generate better code for nir_op_f2f16_rtz
>>>>> ac/nir: have nir_op_f2f16 round to zero
>>>>> radv: expose float16, int16 and int8 features and extensions
>>>>>
>>>>> src/amd/common/ac_llvm_build.c | 355 ++++++++++++++------------
>>>>> src/amd/common/ac_llvm_build.h | 22 +-
>>>>> src/amd/common/ac_llvm_util.c | 9 +-
>>>>> src/amd/common/ac_llvm_util.h | 1 +
>>>>> src/amd/common/ac_nir_to_llvm.c | 258 +++++++++++++++----
>>>>> src/amd/common/ac_shader_abi.h | 1 +
>>>>> src/amd/vulkan/radv_device.c | 17 ++
>>>>> src/amd/vulkan/radv_extensions.py | 4 +
>>>>> src/amd/vulkan/radv_nir_to_llvm.c | 92 ++++---
>>>>> src/amd/vulkan/radv_shader.c | 7 +
>>>>> src/broadcom/compiler/nir_to_vir.c | 1 +
>>>>> src/compiler/nir/nir.h | 1 +
>>>>> src/compiler/nir/nir_opcodes.py | 4 +-
>>>>> src/compiler/nir/nir_opt_algebraic.py | 4 +-
>>>>> src/gallium/drivers/radeonsi/si_get.c | 1 +
>>>>> src/gallium/drivers/vc4/vc4_program.c | 1 +
>>>>> 16 files changed, 516 insertions(+), 262 deletions(-)
>>>>>
More information about the mesa-dev
mailing list