[Mesa-dev] [PATCH] i965/vs: Don't lose the MRF writemask when doing compute-to-MRF.

Kenneth Graunke kenneth at whitecape.org
Thu Oct 25 00:06:11 PDT 2012

Consider the following code sequence:

   mul(8)          g4<1>F          g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
   mov.sat(8)      m1<1>.xyF       g4<4,4,1>F
   mul(8)          g4<1>F          g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
   mov.sat(8)      m1<1>.zwF       g4<4,4,1>F

The compute-to-MRF pass will discover the first mov.sat and attempt to
replace it by rewriting earlier instructions.  Everything works out,
so it replaces scan_inst's destination file, reg, and reg_offset,
resulting in:

   mul(8)          m1<1>F          g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
   mul(8)          g4<1>F          g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
   mov.sat(8)      m1<1>.zwF       g4<4,4,1>F

Unfortunately, it loses the .xy writemask on the mov.sat's MRF
destination.  While this doesn't pose an immediate problem, it then
proceeds to transform the second mov.sat, resulting in:

   mul(8)          m1<1>F          g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
   mul(8)          m1<1>F          g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF

Instead of writing both halves of the vector (like the original code),
it overwrites the full vector both times, clobbering the desired .xy

When encountering a MOV, the compute-to-MRF code scans for instructions
which generate channels of the MOV source.  It ensures that all
necessary channels are available (possibly written by several
instructions).  In this case, *more* channels are available than
necessary, so we want to take the subset that's actually used.
Taking the bitwise and of both writemasks should accomplish that.

This was discovered by analyzing an ARB_vertex_program test
(glean/vertProg1/MUL test (with swizzle and masking)) with my new
Mesa IR -> Vec4 IR translator code.  However, it should be possible
with GLSL programs as well.

NOTE: This is a candidate for stable release branches.

Cc: Eric Anholt <eric at anholt.net>
Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 5c52d3a..10a8310 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -754,6 +754,7 @@ vec4_visitor::opt_compute_to_mrf()
 	       scan_inst->dst.file = MRF;
 	       scan_inst->dst.reg = mrf;
 	       scan_inst->dst.reg_offset = 0;
+	       scan_inst->dst.writemask &= inst->dst.writemask;
 	       scan_inst->saturate |= inst->saturate;
 	    scan_inst = (vec4_instruction *)scan_inst->next;

More information about the mesa-dev mailing list