[Mesa-dev] [PATCH] intel/fs: Don't apply the des stride alignment rule to accumulators

Thu Jan 17 04:39:47 UTC 2019

The pass was discovered to cause problems with the MUL+MACH combination
we emit for nir_[iu]mul_high.  In an experimental branch of mine, I ran
into issues where the MUL+MACH ended up using a strided source due to
working on half of a uint64_t and the new lowering pass helpfully tried
to fix the multiply which wrote to an unstriated accumulator.  Not only
did the multiply not need to be fixed but the "fix" ended up breaking it
because a MOV to the accumulator is not the same as using it as a
multiply destination due to the magic way the 33/64 bits of the
accumulator are handled for different instruction types.

Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass"
Cc: Francisco Jerez <currojerez at riseup.net>
---
 src/intel/compiler/brw_fs_lower_regioning.cpp | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp b/src/intel/compiler/brw_fs_lower_regioning.cpp
index cc4163b4c2c..b8a89e82272 100644
--- a/src/intel/compiler/brw_fs_lower_regioning.cpp
+++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
@@ -53,7 +53,13 @@ namespace {
    unsigned
    required_dst_byte_stride(const fs_inst *inst)
    {
-      if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
+      if (inst->dst.is_accumulator()) {
+         /* Even though it's not explicitly documented in the PRMs or the
+          * BSpec, writes to the accumulator appear to not need any special
+          * treatment with respect too their destination stride alignment.
+          */
+         return inst->dst.stride * type_sz(inst->dst.type);
+      } else if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
           !is_byte_raw_mov(inst)) {
          return get_exec_type_size(inst);
       } else {
@@ -316,6 +322,14 @@ namespace {
    bool
    lower_dst_region(fs_visitor *v, bblock_t *block, fs_inst *inst)
    {
+      /* We cannot replace the result of an integer multiply which writes the
+       * accumulator because MUL+MACH pairs act on the accumulator as a 64-bit
+       * value whereas the MOV will act on only 32 or 33 bits of the
+       * accumulator.
+       */
+      assert(inst->opcode != BRW_OPCODE_MUL || !inst->dst.is_accumulator() ||
+             brw_reg_type_is_floating_point(inst->dst.type));
+
       const fs_builder ibld(v, block, inst);
       const unsigned stride = required_dst_byte_stride(inst) /
                               type_sz(inst->dst.type);
-- 
2.20.1