[Mesa-dev] [RFC PATCH] i965/fs: Don't immediately schedule instructions that were just made available.

Thu Mar 28 13:59:31 PDT 2013

This is not how the final patch would look. Rather, we'd remove the flatten the
if (post_reg_alloc) block and remove the else clause. This patch just aims to
prove that we're choosing instructions in a bad order.

On Sandybridge GLB2.5 C24Z16_DXT1 1600x900 non-composited:

x before
+ after
+------------------------------------------------------------------------------+
|                                     +                                        |
|   x                                 +                                     +  |
|  xxxxx                            ++++x x x                             + ++ |
|x xxxxx              x            ++*++xx*xx  x                     +   ++++++|
|  |___M______________A_____________|___|_M_____________A__________________|   |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  23       8025.58        8203.4       8048.86     8105.5061      72.50085
+  23       8156.34       8323.38       8185.55     8236.8326     74.079214
Difference at 95.0% confidence
	131.327 +/- 43.5508
	1.62021% +/- 0.537299%
	(Student's t, pooled s = 73.2943)

The original goal of pre-register allocation scheduling was to reduce live
ranges so we'd use fewer registers and hopefully fit into 16-wide. In shader-db,
this change causes us to lose 30 16-wide programs, but we gain 29... so it's a
toss-up. At least by choosing instructions in a better order all programs should
be slightly faster. Consider the trivial case of

uniform float a, b;
void main() { gl_FragColor = vec4(cross(a, b)); }

Before the patch we compile this to

mov.sat(8)      m4<1>F          0F
mul(8)          g3<1>F          g2.4<0,1,0>F    g2<0,1,0>F
mad.sat(8)      m3<1>F          -g3<4,1,1>F     g2.3<4,1,1>F.x  g2.1<4,1,1>F.x
mul(8)          g3<1>F          g2.3<0,1,0>F    g2.2<0,1,0>F
mad.sat(8)      m2<1>F          -g3<4,1,1>F     g2.5<4,1,1>F.x  g2<4,1,1>F.x
mul(8)          g3<1>F          g2.5<0,1,0>F    g2.1<0,1,0>F
mad.sat(8)      m1<1>F          -g3<4,1,1>F     g2.4<4,1,1>F.x  g2.2<4,1,1>F.x
sendc(8)        null            m1<8,8,1>F

where we stall on each mad.sat waiting for the mul to finish. The sendc is issued
cycle 66. After the patch it compiles to

mul(8)          g3<1>F          g2.5<0,1,0>F    g2.1<0,1,0>F
mul(8)          g4<1>F          g2.3<0,1,0>F    g2.2<0,1,0>F
mul(8)          g5<1>F          g2.4<0,1,0>F    g2<0,1,0>F
mov.sat(8)      m4<1>F          0F
mad.sat(8)      m1<1>F          -g3<4,1,1>F     g2.4<4,1,1>F.x  g2.2<4,1,1>F.x
mad.sat(8)      m2<1>F          -g4<4,1,1>F     g2.5<4,1,1>F.x  g2<4,1,1>F.x
mad.sat(8)      m3<1>F          -g5<4,1,1>F     g2.3<4,1,1>F.x  g2.1<4,1,1>F.x
sendc(8)        null            m1<8,8,1>F

By hiding much of the latency, the sendc instruction is issued by cycle 32.
---
 .../dri/i965/brw_fs_schedule_instructions.cpp      |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
index 90f1a16..4d2dbe8 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
@@ -753,9 +753,9 @@ instruction_scheduler::schedule_instructions(fs_inst *next_block_header)
           * but also the MRF setup for the next sampler message, which in turn
           * unblocks the next sampler message).
           */
-         for (schedule_node *node = (schedule_node *)instructions.get_tail();
-              node != instructions.get_head()->prev;
-              node = (schedule_node *)node->prev) {
+         for (schedule_node *node = (schedule_node *)instructions.get_head();
+              node != instructions.get_tail()->next;
+              node = (schedule_node *)node->next) {
             schedule_node *n = (schedule_node *)node;
 
             chosen = n;
-- 
1.7.8.6