[Mesa-dev] [PATCH 06/21] i965/fs: Implement SIMD16 64-bit integer multiplies on Gen 7.

Mon Dec 22 19:29:16 PST 2014

From: Matt Turner <mattst88 at gmail.com>

v2 by Ben:
Keep the Gen8 code:
  Gen8 will be fixed later. The intent is to limit the platform impact of the
  patch to fixing just Gen7. The original Gen7 code as written by Matt did not
  work properly on Gen8 (I can't remember why). I haven't actually tested the code
  on Gen8 that was modified by me. It's possible it does work.

Fix the MUL/MACH:
  The docs clearly say that the channel enables need to be used in the
  mul/mach/mov sequence. I am not quite certain of the reason for this when doing
  an 8 channel operation, however I presume it has to be with the weird way the
  accumulator is used (gaining extra precision for the operation).

  I do not believe this to be necessary with the existing code. Quoting the docs:
  > For each enabled channel, this instruction multiplies the DWord in src1 with the
  > high word of the DWord in src0, left shifts the result by 16 bits, adds it with
  > the corresponding accumulator values, and keeps the whole 64-bit result in the
  > accumulator. It then stores the high DWord (bits 63:32) of the results in dst.

  To me this suggests that the dest of mach is always the value we want for
  imul_high. This fixes failing tests on IVB, and doesn't seem to hurt the same
  tests on HSW.

Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 52 ++++++++++++++++------------
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 0d8cacb..de03618 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -799,32 +799,40 @@ fs_visitor::visit(ir_expression *ir)
       }
       break;
    case ir_binop_imul_high: {
-      if (brw->gen >= 7)
+      if (brw->gen >= 8)
          no16("SIMD16 explicit accumulator operands unsupported\n");
 
-      struct brw_reg acc = retype(brw_acc_reg(dispatch_width),
-                                  this->result.type);
+      unsigned width = brw->gen >= 7 ? 8 : dispatch_width;
+      fs_reg acc = fs_reg(retype(brw_acc_reg(width), this->result.type));
 
-      fs_inst *mul = emit(MUL(acc, op[0], op[1]));
-      emit(MACH(this->result, op[0], op[1]));
+      if (brw->gen == 7 && dispatch_width == 16) {
+         emit(MUL(acc, half(op[0], 0), half(op[1], 0)));
+         emit(MACH(half(this->result, 0), half(op[0], 0), half(op[1], 0)));
+         emit(MUL(acc, half(op[0], 1), half(op[1], 1)));
+         emit(MACH(half(this->result, 1), half(op[0], 1),
+                     half(op[1], 1)));
+      } else {
+         fs_inst *mul = emit(MUL(acc, op[0], op[1]));
+         emit(MACH(this->result, op[0], op[1]));
 
-      /* Until Gen8, integer multiplies read 32-bits from one source, and
-       * 16-bits from the other, and relying on the MACH instruction to
-       * generate the high bits of the result.
-       *
-       * On Gen8, the multiply instruction does a full 32x32-bit multiply,
-       * but in order to do a 64x64-bit multiply we have to simulate the
-       * previous behavior and then use a MACH instruction.
-       *
-       * FINISHME: Don't use source modifiers on src1.
-       */
-      if (brw->gen >= 8) {
-         assert(mul->src[1].type == BRW_REGISTER_TYPE_D ||
-                mul->src[1].type == BRW_REGISTER_TYPE_UD);
-         if (mul->src[1].type == BRW_REGISTER_TYPE_D) {
-            mul->src[1].type = BRW_REGISTER_TYPE_W;
-         } else {
-            mul->src[1].type = BRW_REGISTER_TYPE_UW;
+         /* Until Gen8, integer multiplies read 32-bits from one source, and
+          * 16-bits from the other, and relying on the MACH instruction to
+          * generate the high bits of the result.
+          *
+          * On Gen8, the multiply instruction does a full 32x32-bit multiply,
+          * but in order to do a 64x64-bit multiply we have to simulate the
+          * previous behavior and then use a MACH instruction.
+          *
+          * FINISHME: Don't use source modifiers on src1.
+          */
+         if (brw->gen >= 8) {
+            assert(mul->src[1].type == BRW_REGISTER_TYPE_D ||
+                   mul->src[1].type == BRW_REGISTER_TYPE_UD);
+            if (mul->src[1].type == BRW_REGISTER_TYPE_D) {
+               mul->src[1].type = BRW_REGISTER_TYPE_W;
+            } else {
+               mul->src[1].type = BRW_REGISTER_TYPE_UW;
+            }
          }
       }
 
-- 
2.2.1