[Mesa-dev] [PATCH 10/23] i965/fs/lower_simd_width: fix result transposition
Samuel Iglesias Gonsálvez
siglesias at igalia.com
Tue May 3 12:21:59 UTC 2016
From: Iago Toral Quiroga <itoral at igalia.com>
The transposition needs to set exec_all() but it writes directly to the
original instruction's destination, which can lead to execmasking
problems if the original instruction did not have force_writemask_all
set. In that case, write the result of the operation to a temporary,
then move the result to the original instruction's destination without
forcing exec_all().
---
src/mesa/drivers/dri/i965/brw_fs.cpp | 33 ++++++++++++++++++++++++++++++---
1 file changed, 30 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 85d430c..bc81a80 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -4779,10 +4779,37 @@ fs_visitor::lower_simd_width()
* one half per component, because LOAD_PAYLOAD (in terms of which
* emit_transpose is implemented) can only use the same channel
* enable signals for all of its non-header sources.
+ *
+ * If we need to set exec_all() and the original instruction didn't
+ * have force_writemask_all set then we have to write to a
+ * temporary register and then move from there to the actual dst
+ * without exec_all() set to honor the execmask.
*/
- emit_transpose(ibld.exec_all(inst->exec_size > copy_width)
- .group(copy_width, 0),
- inst->dst, dsts, n, dst_size, m);
+ bool force_exec_all = inst->exec_size > copy_width;
+ if (force_exec_all && !inst->force_writemask_all) {
+ fs_reg tmp_dst =
+ fs_reg(VGRF, alloc.allocate(alloc.sizes[inst->dst.nr]));
+ tmp_dst.type = inst->dst.type;
+ emit_transpose(ibld.exec_all(true).group(copy_width, 0),
+ tmp_dst, dsts, n, dst_size, m);
+
+ for (unsigned row = 0; row < n; row++) {
+ for (unsigned col = 0; col < dst_size; col++) {
+ unsigned element_offset = row * dst_size + m * col;
+ const fs_builder tmp_bld =
+ ibld.group(copy_width, element_offset % n);
+ fs_reg dst = offset(inst->dst, tmp_bld, element_offset);
+ fs_reg src = offset(tmp_dst, tmp_bld, element_offset);
+ fs_inst *_inst = tmp_bld.MOV(dst, src);
+ if (row * dst_size + col < inst->header_size)
+ _inst->force_writemask_all = true;
+ }
+ }
+ } else {
+ emit_transpose(ibld.exec_all(force_exec_all)
+ .group(copy_width, 0),
+ inst->dst, dsts, n, dst_size, m);
+ }
}
inst->remove(block);
--
2.5.0
More information about the mesa-dev
mailing list