[Mesa-dev] [PATCH 11/11] HACK: nir/lower_vec_to_movs: Coalesce into destinations of fdot instructions
Jason Ekstrand
jason at jlekstrand.net
Wed Sep 9 17:50:14 PDT 2015
This is labeled HACK because it relies on the validator not validating
destination write-masks with respect to nir_op_infos[op].output_size and
relies on the fact that the i965 vec4 hardware does an implicit splat of
any dot-product results. Technically, nir_op_fdotN produces a single
component that lives in the first slot and the others don't exist.
However, most hardware splats dot-products so this is probably reasonable.
One solution to doing this "properly" would be to add a set of
nir_op_fdotN_replicated opcodes that do the splat and somehow lower to
those at the end of optimizations. I don't think we want to have the
nir_op_fdot opcodes splat in SSA form because that could hurt our
opportunity for CSE. However, the shader-db ressults below show that we
might want to have it splat for the purposes of coalescing.
Shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1778849 -> 1751223 (-1.55%)
instructions in affected programs: 763104 -> 735478 (-3.62%)
helped: 7067
HURT: 26
It turns out that dot-products matter...
Cc: Eduardo Lima Mitev <elima at igalia.com>
---
src/glsl/nir/nir_lower_vec_to_movs.c | 47 ++++++++++++++++++++++++++----------
1 file changed, 34 insertions(+), 13 deletions(-)
diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c b/src/glsl/nir/nir_lower_vec_to_movs.c
index 0ebf3e3..1aa6add 100644
--- a/src/glsl/nir/nir_lower_vec_to_movs.c
+++ b/src/glsl/nir/nir_lower_vec_to_movs.c
@@ -84,6 +84,14 @@ insert_mov(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader)
return mov->dest.write_mask;
}
+static bool
+is_fdot(nir_alu_instr *alu)
+{
+ return alu->op == nir_op_fdot2 ||
+ alu->op == nir_op_fdot3 ||
+ alu->op == nir_op_fdot4;
+}
+
/* Attempts to coalesce the "move" from the given source of the vec to the
* destination of the instruction generating the value. If, for whatever
* reason, we cannot coalesce the mmove, it does nothing and returns 0. We
@@ -121,19 +129,28 @@ try_coalesce(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader)
nir_alu_instr *src_alu =
nir_instr_as_alu(vec->src[start_idx].src.ssa->parent_instr);
- /* We only care about being able to re-swizzle the instruction if it is
- * something that we can reswizzle. It must be per-component.
- */
- if (nir_op_infos[src_alu->op].output_size != 0)
- return 0;
-
- /* If we are going to reswizzle the instruction, we can't have any
- * non-per-component sources either.
- */
- for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
- if (nir_op_infos[src_alu->op].input_sizes[j] != 0)
+ if (is_fdot(src_alu)) {
+ /* The fdot instruction is special: It splats its result to all
+ * components. This means that we can always rewrite its destination
+ * and we don't need to swizzle anything.
+ */
+ } else {
+ /* We only care about being able to re-swizzle the instruction if it is
+ * something that we can reswizzle. It must be per-component. The one
+ * exception to this is the fdotN instructions which implicitly splat
+ * their result out to all channels.
+ */
+ if (nir_op_infos[src_alu->op].output_size != 0)
return 0;
+ /* If we are going to reswizzle the instruction, we can't have any
+ * non-per-component sources either.
+ */
+ for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
+ if (nir_op_infos[src_alu->op].input_sizes[j] != 0)
+ return 0;
+ }
+
/* Stash off all of the ALU instruction's swizzles. */
uint8_t swizzles[4][4];
for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
@@ -153,8 +170,12 @@ try_coalesce(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader)
* instruction so we can re-swizzle that component to match.
*/
write_mask |= 1 << i;
- for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
- src_alu->src[j].swizzle[i] = swizzles[j][vec->src[i].swizzle[0]];
+ if (is_fdot(src_alu)) {
+ /* Since fdot splats, we don't need to do any reswizzling */
+ } else {
+ for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
+ src_alu->src[j].swizzle[i] = swizzles[j][vec->src[i].swizzle[0]];
+ }
/* Clear the no longer needed vec source */
nir_instr_rewrite_src(&vec->instr, &vec->src[i].src, NIR_SRC_INIT);
--
2.5.0.400.gff86faf
More information about the mesa-dev
mailing list