[Mesa-dev] [PATCH v3 2/3] i965: Introduce a MOV_INDIRECT opcode.

Kenneth Graunke kenneth at whitecape.org
Fri Nov 13 18:50:54 PST 2015


The geometry and tessellation control shader stages both read from
multiple URB entries (one per vertex).  The thread payload contains
several URB handles which reference these separate memory segments.

In GLSL, these inputs are represented as per-vertex arrays; the
outermost array index selects which vertex's inputs to read.  This
array index does not necessarily need to be constant.

To handle that, we need to use indirect addressing on GRFs to select
which of the thread payload registers has the appropriate URB handle.
(This is before we can even think about applying the pull model!)

This patch introduces a new opcode which performs a MOV from a
source using VxH indirect addressing (which allows each of the 8
SIMD channels to select distinct data.)

Based on a patch by Jason Ekstrand.

v2: Rename from INDIRECT_THREAD_PAYLOAD_MOV to MOV_INDIRECT; make it
    a bit more generic.  Use regs_read() instead of hacking up the
    register allocator.  (Suggested by Jason Ekstrand.)

v3: Fix regs_read() to be more accurate for small unaligned regions.
    Also rebase on Matt's work.

Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue at linux.intel.com> [v1]

stash
---
 src/mesa/drivers/dri/i965/brw_defines.h        | 10 ++++++++
 src/mesa/drivers/dri/i965/brw_fs.cpp           | 28 +++++++++++++++++++++
 src/mesa/drivers/dri/i965/brw_fs.h             |  5 ++++
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp       |  1 +
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 34 ++++++++++++++++++++++++++
 src/mesa/drivers/dri/i965/brw_shader.cpp       |  2 ++
 6 files changed, 80 insertions(+)

Here's an updated version.  I changed the description of source 2 to be
"the length of the region that could be accessed, in bytes" rather than
"the maximum value of the indirect offset" (aka the largest possible value
of source 1), as that's what I actually wanted - it's much easier to use.

(Notably, the largest possible value of the indirect offset means that
it affects the maximum possible base of the read...but you still read
type_sz() bytes beyond that...which is ugly.  Size of the region is
much easier to use, as base + size = end...and you don't read beyond
the end.)

This is all getting into dire "but isn't it off by 1?" territory, so
here are some examples demonstrating how one might use this:

"I want to read some .xyzw component of a vec4 stored in r7.4."

    src[0] = r7.4<v;w,h>F
    src[1] = <indirect offset>
    src[2] = 16 bytes

    Because subnr != 0, we calculate subnr * type_sz, and get:

         subnr * sizeof(float) = 4 * 4 = 16 bytes

    We add this to region_length to get a total size of 32 bytes.
    DIV_ROUND_UP(32, REG_SIZE) = 32 / 32 = 1 register read.

    Correct, because we're only reading the last 4 channels of r7,
    so our region is contained within 1 register.

"I want to read up to 6 UD channels starting at r7.7."

    src[0] = r7.1<v;w,h>UD
    src[1] = <indirect offset>
    src[2] = 6 * sizeof(UD) = 24 bytes

    Because subnr != 0, we calculate subnr * type_sz, and get:

         subnr * sizeof(UD) = 7 * 4 = 28 bytes.

    We add this to region_length to get a total of 24 + 28 = 52 bytes.
    DIV_ROUND_UP(52, REG_SIZE) = ceil(52 / 32) = 2

    Correct, because we read part of r7 and part of r8.

"I want to get the ICP handles for vertex 2 out of 6 vertices."

    Let's assume the first ICP handle starts at r3.

    The ICP handles for vertex N take up whole register, so we would pick
    src[0] = r(first icp handle)<8,8,1>UD
    src[1] = <7,6,5,4,3,2,1> * 4 + 2 * REG_SIZE
    src[2] = 6 vertices * REG_SIZE = 192 bytes

    subnr == 0, so we don't add anything; region_length stays as 192.
    DIV_ROUND_UP(192, 32) = 6.  We potentially read r3, r4, r5, r6, r7, r8,
    which contain the ICP handles for vertex 0, 1, 2, 3, 4, 5 - 6 vertices.

So I think this will work.

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h
index 6484484..0b8de63 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1289,6 +1289,16 @@ enum opcode {
     * Calculate the high 32-bits of a 32x32 multiply.
     */
    SHADER_OPCODE_MULH,
+
+   /**
+    * A MOV that uses VxH indirect addressing.
+    *
+    * Source 0: A register to start from (HW_REG).
+    * Source 1: An indirect offset (in bytes, UD GRF).
+    * Source 2: The length of the region that could be accessed (in bytes,
+    *           UD immediate).
+    */
+   SHADER_OPCODE_MOV_INDIRECT,
 };
 
 enum brw_urb_write_flags {
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 80b8c8e..84b5920 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -840,6 +840,34 @@ fs_inst::regs_read(int arg) const
    case SHADER_OPCODE_BARRIER:
       return 1;
 
+   case SHADER_OPCODE_MOV_INDIRECT:
+      if (arg == 0) {
+         assert(src[2].file == IMM);
+         unsigned region_length = src[2].ud;
+
+         if (src[0].file == FIXED_GRF) {
+            /* If the start of the region is not register aligned, then
+             * there's some portion of the register that's technically
+             * unread at the beginning.
+             *
+             * However, the register allocator works in terms of whole
+             * registers, and does not use subnr.  It assumes that the
+             * read starts at the beginning of the register, and extends
+             * regs_read() whole registers beyond that.
+             *
+             * To compensate, we extend the region length to include this
+             * unread portion at the beginning.
+             */
+            if (src[0].subnr)
+               region_length += src[0].subnr * type_sz(src[0].type);
+
+            return DIV_ROUND_UP(region_length, REG_SIZE);
+         } else {
+            assert(!"Invalid register file");
+         }
+      }
+      break;
+
    default:
       if (is_tex() && arg == 0 && src[0].file == VGRF)
          return mlen;
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
index f40e58b..cbfc07f 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -527,6 +527,11 @@ private:
                                  struct brw_reg offset,
                                  struct brw_reg value);
 
+   void generate_mov_indirect(fs_inst *inst,
+                              struct brw_reg dst,
+                              struct brw_reg reg,
+                              struct brw_reg indirect_byte_offset);
+
    bool patch_discard_jumps_to_fb_writes();
 
    const struct brw_compiler *compiler;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
index 8c67caf..3c40fcd 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
@@ -78,6 +78,7 @@ is_expression(const fs_visitor *v, const fs_inst *const inst)
    case FS_OPCODE_LINTERP:
    case SHADER_OPCODE_FIND_LIVE_CHANNEL:
    case SHADER_OPCODE_BROADCAST:
+   case SHADER_OPCODE_MOV_INDIRECT:
       return true;
    case SHADER_OPCODE_RCP:
    case SHADER_OPCODE_RSQ:
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 139cda3..e5a286a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -372,6 +372,36 @@ fs_generator::generate_fb_write(fs_inst *inst, struct brw_reg payload)
 }
 
 void
+fs_generator::generate_mov_indirect(fs_inst *inst,
+                                    struct brw_reg dst,
+                                    struct brw_reg reg,
+                                    struct brw_reg indirect_byte_offset)
+{
+   assert(indirect_byte_offset.type == BRW_REGISTER_TYPE_UD);
+   assert(indirect_byte_offset.file == BRW_GENERAL_REGISTER_FILE);
+
+   unsigned imm_byte_offset = reg.nr * REG_SIZE + reg.subnr;
+
+   /* We use VxH indirect addressing, clobbering a0.0 through a0.7. */
+   struct brw_reg addr = vec8(brw_address_reg(0));
+
+   /* The destination stride of an instruction (in bytes) must be greater
+    * than or equal to the size of the rest of the instruction.  Since the
+    * address register is of type UW, we can't use a D-type instruction.
+    * In order to get around this, re re-type to UW and use a stride.
+    */
+   indirect_byte_offset =
+      retype(spread(indirect_byte_offset, 2), BRW_REGISTER_TYPE_UW);
+
+   /* Prior to Broadwell, there are only 8 address registers. */
+   assert(inst->exec_size == 8 || devinfo->gen >= 8);
+
+   brw_MOV(p, addr, indirect_byte_offset);
+   brw_inst_set_mask_control(devinfo, brw_last_inst, BRW_MASK_DISABLE);
+   brw_MOV(p, dst, retype(brw_VxH_indirect(0, imm_byte_offset), dst.type));
+}
+
+void
 fs_generator::generate_urb_read(fs_inst *inst,
                                 struct brw_reg dst,
                                 struct brw_reg header)
@@ -2079,6 +2109,10 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width)
          fill_count++;
 	 break;
 
+      case SHADER_OPCODE_MOV_INDIRECT:
+         generate_mov_indirect(inst, dst, src[0], src[1]);
+         break;
+
       case SHADER_OPCODE_URB_READ_SIMD8:
       case SHADER_OPCODE_URB_READ_SIMD8_PER_SLOT:
          generate_urb_read(inst, dst, src[0]);
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp
index a438e18..50b288b 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -554,6 +554,8 @@ brw_instruction_name(enum opcode op)
       return "barrier";
    case SHADER_OPCODE_MULH:
       return "mulh";
+   case SHADER_OPCODE_MOV_INDIRECT:
+      return "mov_indirect";
    }
 
    unreachable("not reached");
-- 
2.6.2



More information about the mesa-dev mailing list