Mesa (master): i965/vs: Don' t lose the MRF writemask when doing compute-to-MRF.

Thu Oct 25 21:54:55 UTC 2012

Module: Mesa
Branch: master
Commit: 10ff6772c8054aea12ac0f08e2e3898fd4a7f76b
URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=10ff6772c8054aea12ac0f08e2e3898fd4a7f76b

Author: Kenneth Graunke <kenneth at whitecape.org>
Date:   Wed Oct 24 21:16:46 2012 -0700

i965/vs: Don't lose the MRF writemask when doing compute-to-MRF.

Consider the following code sequence:

   mul(8)          g4<1>F          g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
   mov.sat(8)      m1<1>.xyF       g4<4,4,1>F
   mul(8)          g4<1>F          g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
   mov.sat(8)      m1<1>.zwF       g4<4,4,1>F

The compute-to-MRF pass will discover the first mov.sat and attempt to
replace it by rewriting earlier instructions.  Everything works out,
so it replaces scan_inst's destination file, reg, and reg_offset,
resulting in:

   mul(8)          m1<1>F          g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
   mul(8)          g4<1>F          g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
   mov.sat(8)      m1<1>.zwF       g4<4,4,1>F

Unfortunately, it loses the .xy writemask on the mov.sat's MRF
destination.  While this doesn't pose an immediate problem, it then
proceeds to transform the second mov.sat, resulting in:

   mul(8)          m1<1>F          g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
   mul(8)          m1<1>F          g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF

Instead of writing both halves of the vector (like the original code),
it overwrites the full vector both times, clobbering the desired .xy
values.

When encountering a MOV, the compute-to-MRF code scans for instructions
which generate channels of the MOV source.  It ensures that all
necessary channels are available (possibly written by several
instructions).  In this case, *more* channels are available than
necessary, so we want to take the subset that's actually used.
Taking the bitwise and of both writemasks should accomplish that.

This was discovered by analyzing an ARB_vertex_program test
(glean/vertProg1/MUL test (with swizzle and masking)) with my new
Mesa IR -> Vec4 IR translator code.  However, it should be possible
with GLSL programs as well.

NOTE: This is a candidate for stable release branches.

Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
Reviewed-by: Eric Anholt <eric at anholt.net>

---

 src/mesa/drivers/dri/i965/brw_vec4.cpp |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 5c52d3a..10a8310 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -754,6 +754,7 @@ vec4_visitor::opt_compute_to_mrf()
 	       scan_inst->dst.file = MRF;
 	       scan_inst->dst.reg = mrf;
 	       scan_inst->dst.reg_offset = 0;
+	       scan_inst->dst.writemask &= inst->dst.writemask;
 	       scan_inst->saturate |= inst->saturate;
 	    }
 	    scan_inst = (vec4_instruction *)scan_inst->next;