[Mesa-dev] [PATCH 3/6] i965/fs/gen7: split instructions that run into exec masking bugs
Samuel Iglesias Gonsálvez
siglesias at igalia.com
Wed Jul 6 10:09:59 UTC 2016
From: Iago Toral Quiroga <itoral at igalia.com>
In fp64 we can produce code like this:
mov(16) vgrf2<2>:UD, vgrf3<2>:UD
That our simd lowering pass would typically split in instructions with a
width of 8, writing to two consecutive registers each. Unfortunately, gen7
hardware has a bug affecting execution masking and as a result, the
second GRF register write won't work properly. Curro verified this:
"The problem is that pre-Gen8 EUs are hardwired to use the QtrCtrl+1
(where QtrCtrl is the 8-bit quarter of the execution mask signals
specified in the instruction control fields) for the second
compressed half of any single-precision instruction (for
double-precision instructions it's hardwired to use NibCtrl+1),
which means that the EU will apply the wrong execution controls
for the second sequential GRF write if the number of channels per
GRF is not exactly eight in single-precision mode (or four in
double-float mode)."
In practice, this means that we cannot write more than one
consecutive GRF in a single instruction if the number of channels
per GRF is not exactly eight in single-precision mode (or four
in double-float mode).
This patch makes our SIMD lowering pass split this kind of instructions
so that the split versions only write to a single register. In the
example above this means that we split the write in 4 instructions, each
one writing 4 UD elements (width = 4) to a single register.
---
src/mesa/drivers/dri/i965/brw_fs.cpp | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 2f473cc..caf88d1 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -4677,6 +4677,26 @@ static unsigned
get_fpu_lowered_simd_width(const struct brw_device_info *devinfo,
const fs_inst *inst)
{
+ /* Pre-Gen8 EUs are hardwired to use the QtrCtrl+1 (where QtrCtrl is
+ * the 8-bit quarter of the execution mask signals specified in the
+ * instruction control fields) for the second compressed half of any
+ * single-precision instruction (for double-precision instructions
+ * it's hardwired to use NibCtrl+1), which means that the EU will
+ * apply the wrong execution controls for the second sequential GRF
+ * write if the number of channels per GRF is not exactly eight in
+ * single-precision mode (or four in double-float mode).
+ *
+ * In this situation we calculate the maximum size of the split
+ * instructions so they only ever write to a single register.
+ */
+ unsigned type_size = type_sz(inst->dst.type);
+ unsigned channels_per_grf = inst->exec_size / inst->regs_written;
+ assert(channels_per_grf > 0);
+ if (devinfo->gen < 8 && inst->regs_written > 1 &&
+ channels_per_grf != REG_SIZE / type_size) {
+ return channels_per_grf;
+ }
+
/* Maximum execution size representable in the instruction controls. */
unsigned max_width = MIN2(32, inst->exec_size);
--
2.7.4
More information about the mesa-dev
mailing list