[Mesa-dev] [PATCH 00/12] nir: Add some optimizations on variables
Jason Ekstrand
jason at jlekstrand.net
Thu Jul 26 15:59:56 UTC 2018
This series adds some optimizations on variables to try and help shaders
with indirects where we can't just throw the variables away and use SSA.
The particular motivation of this series is the tessellation control
shaders in Batman: Arkham City as translated by DXVK. When DXVK
translates a tessellation shader, it's common to see this pattern:
layout(location=0) in vec3 v0[3];
layout(location=0) in vec2 v1[3];
layout(location=0) out vec4 oVertex[3][32];
vec4 shader_in[3][32];
void hs_main () {
oVertex[gl_InvocationId][0].xyz = shader_in[gl_InvocationId][0].xyz;
oVertex[gl_InvocationId][1].xy = shader_in[gl_InvocationId][1].xy;
// Do some other stuff
}
void main () {
shader_in[0][0].xyz = v0[0];
shader_in[1][0].xyz = v0[1];
shader_in[2][0].xyz = v0[2];
shader_in[0][1].xyz = v1[0];
shader_in[1][1].xyz = v1[1];
shader_in[2][1].xyz = v1[2];
hs_main();
}
Having that shader_in temporary array is currently stops NIR's optimization
ability dead. In anv, we end up generating a shader that first loads all
of the inputs into temporary storage and, because they are indirect, we
generate if-ladders for the reads of shader_in. This isn't so bad in the
above example, but Batman: Arkham City has tessellation control shaders
with 8 inputs of 9 vertices each. That many vec4's works out to 4.5 KiB of
data which is 9x the amount of storage we have per-thread in a SIMD8
shader so we end up spilling the whole lot.
This series attempts to solve this problem (and others like it) by adding
four optimizations:
1. Structure splitting. This isn't actually needed for this case since
there are no structures but it's needed in order for the other passes
to be more generally applicable.
2. Array splitting. This pass looks at something like the shader_in array
above and determines that the second array index is only used directly
and splits it into 32 arrays of vec4[3] and 30 of those arrays then get
deleted because we never use them.
3. Vector narrowing. This pass looks at vectors or arrays of vectors and
tries to determine if some of the channels are unused. It then shrinks
the vector and reworks all the load/store operations to swizzle things
appropriately for the smaller type. This way it can delete components
from the middle of a vector. In the example above, it takes some of
the new vec4[3] arrays created by array splitting and shrinks them to
vec3[3] or vec2[3].
4. Array copy detection. This is a peephole optimization that looks for
a particular array copy pattern and turns it into a copy_deref
intrinsic which copies the entire array. This is useful because
copy_prop_vars can see through copy_deref intrinsics and turn indirect
loads from the destination of the copy into an indirect load of the
source.
The end result of those four optimizations put together is that the above
example now looks something like this (after function inlining and other
optimizations):
layout(location=0) in vec3 v0[3];
layout(location=0) in vec2 v1[3];
layout(location=0) out vec4 oVertex[3][32];
vec4 shader_in[3][32];
void main () {
oVertex[gl_InvocationId][0].xyz = v0[gl_InvocationId].xyz;
oVertex[gl_InvocationId][1].xy = v1[gl_InvocationId].xy;
// Do some other stuff
}
and we can very nicely handle the indirect per-vertex loads in the back-end
without the need for if-ladders. The end result is that the tessellation
shaders in Batman: Arkham City no longer spill at all and are actually
readable.
Another side-effect of this series is that it potentially allows us to
vastly simplify nir_lower_vars_to_ssa. Most of the complexity in the
vars_to_ssa pass comes with trying to handle structures, arrays, potential
aliasing, etc. If we run structure and array splitting prior to
vars_to_ssa, we could make it only consider non-array vector or scalar
variables and get exactly the same effect. Gone would be the pile of data
structure that we build just to determine if a particular array dimension
is indirected.
This series can be found on my gitlab here:
https://gitlab.freedesktop.org/jekstrand/mesa/commits/wip/nir-var-opts
Cc: Timothy Arceri <tarceri at itsqueeze.com>
Jason Ekstrand (12):
util/list: Make some helpers take const lists
nir: Take if uses into account in ssa_def_components_read
nir/print: Remove a bogus assert
nir/instr_set: Fix nir_instrs_equal for derefs
nir/types: Add array_or_matrix helpers
nir: Add a structure splitting pass
nir: Add an array splitting pass
intel/nir: Use the new structure and array splitting passes
nir: Add a array-of-vector variable narrowing pass
intel/nir: Use narrow_vec_vars
nir: Add an array copy optimization
intel/nir: Enable nir_opt_find_array_copies
src/compiler/Makefile.sources | 2 +
src/compiler/nir/meson.build | 2 +
src/compiler/nir/nir.c | 3 +
src/compiler/nir/nir.h | 5 +
src/compiler/nir/nir_instr_set.c | 4 +-
src/compiler/nir/nir_opt_find_array_copies.c | 376 ++++++
src/compiler/nir/nir_print.c | 1 -
src/compiler/nir/nir_split_vars.c | 1219 ++++++++++++++++++
src/compiler/nir_types.cpp | 15 +
src/compiler/nir_types.h | 2 +
src/intel/compiler/brw_nir.c | 4 +
src/util/list.h | 8 +-
12 files changed, 1634 insertions(+), 7 deletions(-)
create mode 100644 src/compiler/nir/nir_opt_find_array_copies.c
create mode 100644 src/compiler/nir/nir_split_vars.c
--
2.17.1
More information about the mesa-dev
mailing list