[Mesa-dev] [RFC PATCH] i965/fs: Don't immediately schedule instructions that were just made available.
Matt Turner
mattst88 at gmail.com
Thu Mar 28 13:59:31 PDT 2013
This is not how the final patch would look. Rather, we'd remove the flatten the
if (post_reg_alloc) block and remove the else clause. This patch just aims to
prove that we're choosing instructions in a bad order.
On Sandybridge GLB2.5 C24Z16_DXT1 1600x900 non-composited:
x before
+ after
+------------------------------------------------------------------------------+
| + |
| x + + |
| xxxxx ++++x x x + ++ |
|x xxxxx x ++*++xx*xx x + ++++++|
| |___M______________A_____________|___|_M_____________A__________________| |
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 23 8025.58 8203.4 8048.86 8105.5061 72.50085
+ 23 8156.34 8323.38 8185.55 8236.8326 74.079214
Difference at 95.0% confidence
131.327 +/- 43.5508
1.62021% +/- 0.537299%
(Student's t, pooled s = 73.2943)
The original goal of pre-register allocation scheduling was to reduce live
ranges so we'd use fewer registers and hopefully fit into 16-wide. In shader-db,
this change causes us to lose 30 16-wide programs, but we gain 29... so it's a
toss-up. At least by choosing instructions in a better order all programs should
be slightly faster. Consider the trivial case of
uniform float a, b;
void main() { gl_FragColor = vec4(cross(a, b)); }
Before the patch we compile this to
mov.sat(8) m4<1>F 0F
mul(8) g3<1>F g2.4<0,1,0>F g2<0,1,0>F
mad.sat(8) m3<1>F -g3<4,1,1>F g2.3<4,1,1>F.x g2.1<4,1,1>F.x
mul(8) g3<1>F g2.3<0,1,0>F g2.2<0,1,0>F
mad.sat(8) m2<1>F -g3<4,1,1>F g2.5<4,1,1>F.x g2<4,1,1>F.x
mul(8) g3<1>F g2.5<0,1,0>F g2.1<0,1,0>F
mad.sat(8) m1<1>F -g3<4,1,1>F g2.4<4,1,1>F.x g2.2<4,1,1>F.x
sendc(8) null m1<8,8,1>F
where we stall on each mad.sat waiting for the mul to finish. The sendc is issued
cycle 66. After the patch it compiles to
mul(8) g3<1>F g2.5<0,1,0>F g2.1<0,1,0>F
mul(8) g4<1>F g2.3<0,1,0>F g2.2<0,1,0>F
mul(8) g5<1>F g2.4<0,1,0>F g2<0,1,0>F
mov.sat(8) m4<1>F 0F
mad.sat(8) m1<1>F -g3<4,1,1>F g2.4<4,1,1>F.x g2.2<4,1,1>F.x
mad.sat(8) m2<1>F -g4<4,1,1>F g2.5<4,1,1>F.x g2<4,1,1>F.x
mad.sat(8) m3<1>F -g5<4,1,1>F g2.3<4,1,1>F.x g2.1<4,1,1>F.x
sendc(8) null m1<8,8,1>F
By hiding much of the latency, the sendc instruction is issued by cycle 32.
---
.../dri/i965/brw_fs_schedule_instructions.cpp | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
index 90f1a16..4d2dbe8 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
@@ -753,9 +753,9 @@ instruction_scheduler::schedule_instructions(fs_inst *next_block_header)
* but also the MRF setup for the next sampler message, which in turn
* unblocks the next sampler message).
*/
- for (schedule_node *node = (schedule_node *)instructions.get_tail();
- node != instructions.get_head()->prev;
- node = (schedule_node *)node->prev) {
+ for (schedule_node *node = (schedule_node *)instructions.get_head();
+ node != instructions.get_tail()->next;
+ node = (schedule_node *)node->next) {
schedule_node *n = (schedule_node *)node;
chosen = n;
--
1.7.8.6
More information about the mesa-dev
mailing list