[Mesa-dev] [PATCH v3 1/2] intel/fs: New methods dst_write_pattern and src_read_pattern at fs_inst

Jose Maria Casanova Crespo jmcasanova at igalia.com
Mon Jul 23 20:20:12 UTC 2018


These new methods return for a instruction register source/destination
the read/write byte pattern of the 32-byte GRF as an unsigned int.

The returned pattern takes into account the exec_size of the instruction,
the type bitsize, the register stride and a relative offset inside the
register.

The motivation of this functions if to know the read/written bytes
of the instructions to improve the liveness analysis for partial read/writes.

We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
parameter they have a different read pattern.

v2: (Francisco Jerez)
    - Split original register_byte_use_pattern into one read and other
      write.
    - Check for send like instructions using this->mlen != 0
    - Pass functions src number and offset.
    - Use periodic_mask function with code written by Francisco Jerez
      to simplify pattern generation.
    - Avoid breaking silently if source straddles multiple GRFs.

v3: (Francisco Jerez)
    - A SEND could be this->mlen != 0 or this->is_send_from_grf
    - We only assume that a periodic mask with offset could be applied
      to reg_offset == 0.
    - We can assure that for MOVs operations for any offset (Chema)

Cc: Francisco Jerez <currojerez at riseup.net>

---
 src/intel/compiler/brw_fs.cpp  | 119 +++++++++++++++++++++++++++++++++
 src/intel/compiler/brw_ir_fs.h |   2 +
 2 files changed, 121 insertions(+)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 7ddbd285fe2..4fa0f154c44 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -39,6 +39,7 @@
 #include "compiler/glsl_types.h"
 #include "compiler/nir/nir_builder.h"
 #include "program/prog_parameter.h"
+#include <limits.h>
 
 using namespace brw;
 
@@ -687,6 +688,124 @@ fs_inst::is_partial_write() const
            this->dst.offset % REG_SIZE != 0);
 }
 
+/**
+ * Returns a periodic mask that is repeated "count" times with a "step"
+ * size and consecutive "bits" finally shifted "offset" bits to the left.
+ *
+ * This helper is used to calculate the representations of byte read/write
+ * register patterns
+ *
+ * Example: periodic_mask(8, 4, 2, 0)  would return 0x33333333
+ *          periodic_mask(8, 4, 2, 2)  would return 0xcccccccc
+ *          periodic_masc(8, 2, 2, 16) would return 0xffff0000
+ */
+static inline uint32_t
+periodic_mask(unsigned count, unsigned step, unsigned bits, unsigned offset)
+{
+   uint32_t m = (count ? (1 << bits) - 1 : 0);
+   const unsigned max = MIN2(count * step, sizeof(m) * CHAR_BIT);
+
+   for (unsigned shift = step; shift < max; shift *= 2)
+      m |= m << shift;
+
+   return m << offset;
+}
+
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been written by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize and the
+ * stride of the destination register.
+ *
+ * The objective of this function is to identify which parts of the register
+ * are defined for operations that don't write a full register. So we
+ * we can identify in live range variable analysis if a partial write has
+ * completelly defined the data used by a partial read.
+ */
+unsigned
+fs_inst::dst_write_pattern(unsigned reg_offset) const
+{
+   assert(this->dst.file == VGRF);
+   /* We don't know what is written so we return the worst case */
+   if (this->predicate && this->opcode != BRW_OPCODE_SEL)
+      return 0u;
+   /* We assume that send destinations are completelly defined */
+   if (this->is_send_from_grf() || this->mlen != 0) {
+      return ~0u;
+   }
+
+   /* The byte pattern is calculated using a periodic mask for reg_offset == 0
+    * because the internal offset will match how the register is written.
+    *
+    * We can for any reg_offset on MOV operations. We could add in the future
+    * other opcodes, but we didn't include them until we have evidences of
+    * them being used in partial write situations that ensure that the pattern
+    * is repeated of any reg_offset.
+    */
+   if (reg_offset == 0 || this->opcode == BRW_OPCODE_MOV) {
+      return periodic_mask(this->exec_size,
+                           this->dst.stride * type_sz(this->dst.type),
+                           type_sz(this->dst.type),
+                           this->dst.offset % REG_SIZE);
+   }
+   /* This shouldn't be reached by in liveness range calcluation but if
+    * function is other context we know that we write a complete register.
+    */
+   if (!this->is_partial_write())
+      return ~0u;
+
+   /* By default we don't know what is written */
+   return 0u;
+}
+
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been read by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize and stride of
+ * a source register and a register offset.
+ *
+ * The objective of this function is to identify which parts of the register
+ * are used for operations that don't read a full register.
+ */
+unsigned
+fs_inst::src_read_pattern(int i, unsigned reg_offset) const
+{
+   assert(src[i].file == VGRF);
+   /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned
+    * so the read pattern depends on the bitsize stored at src[4].
+    */
+   if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL && i == 1)
+      return periodic_mask(8, 4, this->src[4].ud / 8, 0);
+   /* As for byte_scattered_write_logical but we need to take into account
+    * that data written in the payload(src[0]) are now on reg_offset 1 on SIMD8
+    * and reg_offset 2 and 3 on SIMD16.
+    */
+   if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE && i == 0) {
+      if (DIV_ROUND_UP(reg_offset, (this->exec_size / 8)) == 1)
+         return periodic_mask(8, 4, this->src[2].ud / 8, 0);
+   }
+   /* We assume that send sources could be completelly used */
+   if (this->is_send_from_grf() || this->mlen != 0)
+      return ~0u;
+
+   /* The byte pattern is calculated using a periodic mask for reg_offset == 0
+    * because the internal offset will match how the register is read.
+    *
+    * We can for any reg_offset on MOV operations. We could add in the future
+    * other opcodes, but we didn't include them until we have evidences of
+    * them being used in partial read situations that ensure that the pattern
+    * is repeated of any reg_offset.
+    */
+   if (!reg_offset || this->opcode == BRW_OPCODE_MOV) {
+      return periodic_mask(this->exec_size,
+                           this->src[i].stride * type_sz(this->src[i].type),
+                           type_sz(this->src[i].type),
+                           this->src[i].offset % REG_SIZE);
+   }
+   /* By default we assume that any byte could be read */
+   return ~0u;
+}
+
 unsigned
 fs_inst::components_read(unsigned i) const
 {
diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
index 92dad269a34..dab776a3664 100644
--- a/src/intel/compiler/brw_ir_fs.h
+++ b/src/intel/compiler/brw_ir_fs.h
@@ -350,6 +350,8 @@ public:
    bool equals(fs_inst *inst) const;
    bool is_send_from_grf() const;
    bool is_partial_write() const;
+   unsigned src_read_pattern(int src, unsigned reg_offset) const;
+   unsigned dst_write_pattern(unsigned reg_offset) const;
    bool is_copy_payload(const brw::simple_allocator &grf_alloc) const;
    unsigned components_read(unsigned i) const;
    unsigned size_read(int arg) const;
-- 
2.17.1



More information about the mesa-dev mailing list