[Nouveau] [PATCH v2 3/3] nv50/ir: run some passes multiple times
Karol Herbst
karolherbst at gmail.com
Mon Apr 3 15:58:22 UTC 2017
With the shader cache, compilation time matters less.
As a side effect we can write more optimizations to produce better optimized
code.
total instructions in shared programs : 3931743 -> 3917512 (-0.36%)
total gprs used in shared programs : 481460 -> 481680 (0.05%)
total local used in shared programs : 27481 -> 26761 (-2.62%)
total bytes used in shared programs : 36032672 -> 35902648 (-0.36%)
local gpr inst bytes
helped 48 133 3843 3843
hurt 1 295 75 75
Signed-off-by: Karol Herbst <karolherbst at gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 0de84fe9fc..505de08573 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -3729,12 +3729,17 @@ Program::optimizeSSA(int level)
RUN_PASS(1, CopyPropagation, run);
RUN_PASS(1, MergeSplits, run);
RUN_PASS(2, GlobalCSE, run);
- RUN_PASS(1, LocalCSE, run);
- RUN_PASS(2, AlgebraicOpt, run);
- RUN_PASS(2, ModifierFolding, run); // before load propagation -> less checks
- RUN_PASS(1, ConstantFolding, foldAll);
- RUN_PASS(2, LateAlgebraicOpt, run);
- RUN_PASS(1, Split64BitOpPreRA, run);
+ for (int i = 0; i < 2; ++i) {
+ RUN_PASS(1, LocalCSE, run);
+ RUN_PASS(2, AlgebraicOpt, run);
+ RUN_PASS(2, ModifierFolding, run); // before load propagation -> less checks
+ RUN_PASS(1, ConstantFolding, foldAll);
+ RUN_PASS(2, LateAlgebraicOpt, run);
+ // only once
+ if (i == 0)
+ RUN_PASS(1, Split64BitOpPreRA, run);
+ RUN_PASS(1, DeadCodeElim, buryAll);
+ }
RUN_PASS(1, LoadPropagation, run);
RUN_PASS(1, IndirectPropagation, run);
RUN_PASS(2, MemoryOpt, run);
--
2.12.2
More information about the Nouveau
mailing list