[Mesa-dev] [PATCH] i965/fs: Don't immediately schedule instructions that were just made available.
Matt Turner
mattst88 at gmail.com
Fri Mar 29 16:07:04 PDT 2013
The original goal of pre-register allocation scheduling was to reduce
live ranges so we'd use fewer registers and hopefully fit into 16-wide.
In shader-db, this change causes us to lose 30 16-wide programs, but we
gain 29... so it's a toss-up. At least by choosing instructions in a
better order all programs should be slightly faster.
On Haswell GLB2.5 C24Z16_DXT1 1600x900 non-composited:
x before-
+ after-
+--------------------------------------------------------------------------+
| ++|
| x x x + ++|
|xxxxx xx +++++|
| |__A_| |AM||
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 8794.6 8825.44 8812.44 8811.01 10.288483
+ 10 9110.87 9129.38 9124.95 9122.438 6.38743
Difference at 95.0% confidence
311.428 +/- 8.04582
3.53453% +/- 0.0913155%
(Student's t, pooled s = 8.56306)
Consider the trivial case of
uniform float a, b;
void main() { gl_FragColor = vec4(cross(a, b)); }
Before the patch we compile this to
mov.sat(8) m4<1>F 0F
mul(8) g3<1>F g2.4<0,1,0>F g2<0,1,0>F
mad.sat(8) m3<1>F -g3<4,1,1>F g2.3<4,1,1>F.x g2.1<4,1,1>F.x
mul(8) g3<1>F g2.3<0,1,0>F g2.2<0,1,0>F
mad.sat(8) m2<1>F -g3<4,1,1>F g2.5<4,1,1>F.x g2<4,1,1>F.x
mul(8) g3<1>F g2.5<0,1,0>F g2.1<0,1,0>F
mad.sat(8) m1<1>F -g3<4,1,1>F g2.4<4,1,1>F.x g2.2<4,1,1>F.x
sendc(8) null m1<8,8,1>F
where we stall on each mad.sat waiting for the mul to finish. The sendc
is issued cycle 66. After the patch it compiles to
mul(8) g3<1>F g2.5<0,1,0>F g2.1<0,1,0>F
mul(8) g4<1>F g2.3<0,1,0>F g2.2<0,1,0>F
mul(8) g5<1>F g2.4<0,1,0>F g2<0,1,0>F
mov.sat(8) m4<1>F 0F
mad.sat(8) m1<1>F -g3<4,1,1>F g2.4<4,1,1>F.x g2.2<4,1,1>F.x
mad.sat(8) m2<1>F -g4<4,1,1>F g2.5<4,1,1>F.x g2<4,1,1>F.x
mad.sat(8) m3<1>F -g5<4,1,1>F g2.3<4,1,1>F.x g2.1<4,1,1>F.x
sendc(8) null m1<8,8,1>F
By hiding much of the latency, the sendc instruction is issued by cycle
32.
---
.../dri/i965/brw_fs_schedule_instructions.cpp | 46 ++++------------------
1 file changed, 7 insertions(+), 39 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
index 997341b..4aeb738 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
@@ -725,48 +725,16 @@ instruction_scheduler::schedule_instructions(fs_inst *next_block_header)
schedule_node *chosen = NULL;
int chosen_time = 0;
- if (post_reg_alloc) {
- /* Of the instructions closest ready to execute or the closest to
- * being ready, choose the oldest one.
- */
- foreach_list(node, &instructions) {
- schedule_node *n = (schedule_node *)node;
-
- if (!chosen || n->unblocked_time < chosen_time) {
- chosen = n;
- chosen_time = n->unblocked_time;
- }
- }
- } else {
- /* Before register allocation, we don't care about the latencies of
- * instructions. All we care about is reducing live intervals of
- * variables so that we can avoid register spilling, or get 16-wide
- * shaders which naturally do a better job of hiding instruction
- * latency.
- *
- * To do so, schedule our instructions in a roughly LIFO/depth-first
- * order: when new instructions become available as a result of
- * scheduling something, choose those first so that our result
- * hopefully is consumed quickly.
- *
- * The exception is messages that generate more than one result
- * register (AKA texturing). In those cases, the LIFO search would
- * normally tend to choose them quickly (because scheduling the
- * previous message not only unblocked the children using its result,
- * but also the MRF setup for the next sampler message, which in turn
- * unblocks the next sampler message).
- */
- for (schedule_node *node = (schedule_node *)instructions.get_tail();
- node != instructions.get_head()->prev;
- node = (schedule_node *)node->prev) {
- schedule_node *n = (schedule_node *)node;
+ /* Of the instructions closest ready to execute or the closest to
+ * being ready, choose the oldest one.
+ */
+ foreach_list(node, &instructions) {
+ schedule_node *n = (schedule_node *)node;
+ if (!chosen || n->unblocked_time < chosen_time) {
chosen = n;
- if (chosen->inst->regs_written() <= 1)
- break;
+ chosen_time = n->unblocked_time;
}
-
- chosen_time = chosen->unblocked_time;
}
/* Schedule this instruction. */
--
1.7.12.4
More information about the mesa-dev
mailing list