[Mesa-dev] [PATCH 61/95] i965/vec4: do not emit 64-bit MAD

Tue Jul 19 10:40:58 UTC 2016

RepCtrl=1 does not work with 64-bit operands so we need to use RepCtrl=0.

In that situation, the regioning generated for the sources seems to be
equivalent to <4,4,1>:DF, so it will only work for components XY, which
means that we have to move any other swizzle to a temporary so that we can
source from channel X (or Y) in MAD and we also need to split the instruction
(we are already scalarizing DF instructions but there is room for
improvement and with MAD would be more restricted in that area)

Also, it seems that MAD operations like this only write proper output for
channels X and Y, so writes to Z and W also need to be done to a temporary
using channels X/Y and then move that to channels Z or W of the actual dst.

As a result the code we produce for native 64-bit MAD instructions is rather
bad, and much worse than just emitting MUL+ADD. For reference, a simple case
of a fully scalarized dvec4 MAD operation requires 15 instructions if we use
native MAD and 8 instructions if we emit ADD+MUL instead. There are some
improvements that we can do to the emission of MAD that might bring the
instruction count down in some cases, but it comes at the expense of a more
complex implementation so it does not seem worth it, at least initially.

This patch adds a lowering pass after the main optimization loop that
translates any existing 64-bit MAD instruction to ADD+MUL.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 46 ++++++++++++++++++++++++++++++++++
 src/mesa/drivers/dri/i965/brw_vec4.h   |  1 +
 2 files changed, 47 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index b8dd094..e204d81 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -2203,6 +2203,51 @@ vec4_visitor::scalarize_df()
 }
 
 bool
+vec4_visitor::translate_64bit_mad_to_mul_add()
+{
+   bool progress = false;
+
+   foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) {
+      if (inst->opcode != BRW_OPCODE_MAD)
+         continue;
+
+      if (type_sz(inst->dst.type) != 8)
+         continue;
+
+      dst_reg mul_dst = dst_reg(this, glsl_type::dvec4_type);
+      vec4_instruction *mul = new(mem_ctx)
+         vec4_instruction(BRW_OPCODE_MUL, mul_dst,
+                          inst->src[1], inst->src[2]);
+      mul->exec_size = inst->exec_size;
+      mul->group = inst->group;
+      mul->force_writemask_all = inst->force_writemask_all;
+      mul->conditional_mod = inst->conditional_mod;
+      mul->predicate = inst->predicate;
+
+      vec4_instruction *add = new(mem_ctx)
+         vec4_instruction(BRW_OPCODE_ADD, inst->dst,
+                          src_reg(mul_dst), inst->src[0]);
+      add->exec_size = inst->exec_size;
+      add->group = add->group;
+      add->force_writemask_all = inst->force_writemask_all;
+      add->conditional_mod = inst->conditional_mod;
+      add->predicate = inst->predicate;
+      add->saturate = inst->saturate;
+
+      inst->insert_before(block, mul);
+      inst->insert_before(block, add);
+      inst->remove(block);
+
+      progress = true;
+   }
+
+   if (progress)
+      invalidate_live_intervals();
+
+   return progress;
+}
+
+bool
 vec4_visitor::expand_64bit_swizzle_to_32bit()
 {
    bool progress = false;
@@ -2357,6 +2402,7 @@ vec4_visitor::run()
    if (failed)
       return false;
 
+   OPT(translate_64bit_mad_to_mul_add);
    OPT(scalarize_df);
 
    setup_payload();
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h
index 6504939..26228d0 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -162,6 +162,7 @@ public:
 
    bool lower_simd_width();
    bool scalarize_df();
+   bool translate_64bit_mad_to_mul_add();
    bool expand_64bit_swizzle_to_32bit();
 
    vec4_instruction *emit(vec4_instruction *inst);
-- 
2.7.4