[Mesa-dev] [PATCH 00/15] radv: Support for VK_AMD_shader_ballot

Tue Aug 8 19:18:17 UTC 2017

On Mon, Aug 7, 2017 at 6:32 PM, Connor Abbott <connora at valvesoftware.com> wrote:
> From: Connor Abbott <cwabbott0 at gmail.com>
>
> This series implements VK_AMD_shader_ballot for radv. This extension
> builds on VK_EXT_shader_subgroup_ballot and VK_EXT_shader_subgroup_vote
> by adding a number of reductions across a subgroup (or wavefront in AMD
> terminology). Previously, shaders had to use shared memory to compute,
> say, the average across all threads in a workgroup, or the minimum and
> maximum values across a workgroup. But that requires a lot of accesses
> to LDS memory, which is (relatively) slow. This extension allows the
> shader to do part of the reduction directly in registers, as long as it
> stays within a single wavefront, reducing the amount of traffic to the
> LDS that has to happen. It also adds a few AMD-specific instructions,
> like mbcnt. To get an idea of what exactly is in the extension, and what
> inclusive scan, exclusive scan, etc. mean, you can look at the GL
> extension which exposes mostly the same things [1].
>
> Why should you care? It turns out that with this extension enabled, plus
> a few other AMD-specific extensions that are mostly trivial, DOOM will
> take a different path that uses shaders that were tuned specifically for
> AMD hardware. I haven't actually tested DOOM yet, since a few more
> things need to be wired up, but it's a lot less work than this extension
> and I'm sure Dave or Bas will be do it for me when they get around to it
> :).
>
> It uses a few new features of the AMDGPU LLVM backend that I just
> landed, as well as one more small change that still needs review:
> https://reviews.llvm.org/D34718, so it's going to require LLVM 6.0. It
> also uses the DPP modifier that was only added on VI since that was
> easier than using ds_swizzle (which is available on all GCN cards). It
> should be possible to implement support for older cards using
> ds_swizzle, but I haven't gotten to it yet. A note to those reviewing:
> it might be helpful to look at the LLVM changes that this series uses,
> in particular:
>
> https://reviews.llvm.org/rL310087
> https://reviews.llvm.org/rL310088
> https://reviews.llvm.org/D34718
>
> in order to get the complete picture.

I've just pushed the last LLVM change required as
https://reviews.llvm.org/rL310399, so this series should now work with
upstream LLVM master.

>
> This series depends on my previous series [2] to implement
> VK_EXT_shader_subgroup_vote and VK_EXT_shader_subgroup_ballot, if
> nothing else in order to be able to test the implementation. I think
> DOOM also uses the latter two extensions. I've also based on my series
> adding cross-thread semantics to NIR [3], which Jason needs to review,
> since I was hoping that would land first, although with a little effort
> it should be possible to land this first (it would require changing
> PATCH 01 a little). The whole thing is available at:
>
> git://people.freedesktop.org/~cwabbott0/mesa radv-amd-shader-ballot
>
> and the LLVM branch that I've been using to test, with the one patch
> added is at:
>
> https://github.com/cwabbott0/llvm.git dpp-intrinsics-v4

I've also forced-pushed all three Mesa branches (nir-divergence-v4,
radv-shader-ballot-v4, and radv-amd-shader-ballot) with trivial
rebasing after pushing the last patch in this series. I've also pushed
my Crucible tests to

git://people.freedesktop.org/~cwabbott0/crucible amd-shader-ballot

although I haven't yet cleaned things up. At least it'll be useful for
making sure this code still works.

>
> I've got some Crucible tests for exercising the various different parts
> of the implementation, although I didn't bother to test all the possible
> combinations of reductions, since they didn't really require any special
> code to implement anyways. I'll try and get that cleaned up and sent out
> soon. Maybe I should just push the tests?
>
> Finally, I'm leaving Valve soon (this week) to go back to school, and I
> suspect that I won't have too much time to work on this afterwards, so
> someone else will probably have to pick it up. I've been working on this
> for most of the summer, since it turned out to be a way more complicated
> beast to implement than I thought. It's required changes across the
> entire stack, from spirv-to-nir all the way down to register allocation
> in the LLVM backend.  Thankfully, though, most of the tricky LLVM
> changes have landed (thanks Nicolai for reviewing!) and what's left is a
> lot more straightforward. I should still be around to answer questions,
> though. Whew!
>
> [1] https://www.khronos.org/registry/OpenGL/extensions/AMD/AMD_shader_ballot.txt
> [2] https://lists.freedesktop.org/archives/mesa-dev/2017-August/164903.html
> [3] https://lists.freedesktop.org/archives/mesa-dev/2017-August/164898.html
>
> Connor Abbott (15):
>   nir: define intrinsics needed for AMD_shader_ballot
>   spirv: import AMD extensions header
>   spirv: add plumbing for SPV_AMD_shader_ballot and Groups
>   nir: rename and generalize nir_lower_read_invocation_to_scalar
>   nir: scalarize AMD_shader_ballot intrinsics
>   radv: call nir_lower_cross_thread_to_scalar()
>   nir: add a lowering pass for some cross-workgroup intrinsics
>   radv: use nir_lower_group_reduce()
>   ac: move ac_to_integer() and ac_to_float() to ac_llvm_build.c
>   ac: remove bitcast_to_float()
>   ac: fix ac_get_type_size() for doubles
>   ac: add support for SPV_AMD_shader_ballot
>   ac/nir: add support for SPV_AMD_shader_ballot
>   radv: enable VK_AMD_shader_ballot
>   ac/nir: fix saturate emission
>
>  src/amd/common/ac_llvm_build.c                     | 783 ++++++++++++++++++++-
>  src/amd/common/ac_llvm_build.h                     | 120 ++++
>  src/amd/common/ac_nir_to_llvm.c                    | 300 ++++----
>  src/amd/vulkan/radv_device.c                       |  15 +
>  src/amd/vulkan/radv_pipeline.c                     |   6 +
>  src/compiler/Makefile.sources                      |   4 +-
>  src/compiler/nir/nir.h                             |  11 +-
>  src/compiler/nir/nir_intrinsics.h                  | 124 +++-
>  ...scalar.c => nir_lower_cross_thread_to_scalar.c} |  63 +-
>  src/compiler/nir/nir_lower_group_reduce.c          | 179 +++++
>  src/compiler/nir/nir_print.c                       |   1 +
>  src/compiler/spirv/GLSL.ext.AMD.h                  |  93 +++
>  src/compiler/spirv/nir_spirv.h                     |   2 +
>  src/compiler/spirv/spirv_to_nir.c                  |  32 +-
>  src/compiler/spirv/vtn_amd.c                       | 281 ++++++++
>  src/compiler/spirv/vtn_private.h                   |   9 +
>  src/intel/compiler/brw_nir.c                       |   2 +-
>  17 files changed, 1846 insertions(+), 179 deletions(-)
>  rename src/compiler/nir/{nir_lower_read_invocation_to_scalar.c => nir_lower_cross_thread_to_scalar.c} (56%)
>  create mode 100644 src/compiler/nir/nir_lower_group_reduce.c
>  create mode 100644 src/compiler/spirv/GLSL.ext.AMD.h
>  create mode 100644 src/compiler/spirv/vtn_amd.c
>
> --
> 2.9.4
>