[Nouveau] [PATCH v2] nvc0/ir: propagate immediates to CALL input MOVs

Tobias Klausmann tobias.johannes.klausmann at mni.thm.de
Sun Aug 13 01:02:39 UTC 2017


On using builtin functions we have to move the input to registers $0 and $1, if
one of the input value is an immediate, we fail to propagate the immediate:

...
mov u32 $r477 0x00000003 (0)
...
mov u32 $r0 %r473 (0)
mov u32 $r1 $r477 (0)
call abs BUILTIN:0 (0)
mov u32 %r495 $r1 (0)
...

With this patch the immediate is propagated, potentially causing the first MOV
to be superfluous, which we'd remove in that case:

...

mov u32 $r0 %r473 (0)
mov u32 $r1 0x00000003 (0)
call abs BUILTIN:0 (0)
mov u32 %r495 $r1 (0)
...

Shaderdb stats:
total instructions in shared programs : 4893460 -> 4893324 (-0.00%)
total gprs used in shared programs    : 582972 -> 582881 (-0.02%)
total local used in shared programs   : 17960 -> 17960 (0.00%)

                local        gpr       inst      bytes
    helped           0          91         112         112
      hurt           0           0           0           0

v2:
 implement some changes proposed by imirkin, the manual deletion of the dead
 mov is necessary after ea22ac23e0 ("nvc0/ir: unlink values pre- and post-call
 to division function") as the potentially dead mov is unlinked properly,
 causing later passes to not notice the mov op at all and thus not cleaning it
 up. That makes up a big chunk of the regression the above commit caused.
 Keep the deletion of the op where it is, deleting it later unnecessarily blows
 up size of the change.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
---
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp       | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index c8f0701572..7243b1d2e4 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i)
    int builtin;
 
    bld.setPosition(i, false);
-   bld.mkMovToReg(0, i->getSrc(0));
-   bld.mkMovToReg(1, i->getSrc(1));
+
+   // Generate movs to the input regs for the call we want to generate
+   for (int s = 0; i->srcExists(s); ++s) {
+      Instruction *ld = i->getSrc(s)->getInsn();
+      assert(ld->getSrc(0) != NULL);
+      // check if we are moving an immediate, propagate it in that case
+      if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) ||
+            !(ld->src(0).getFile() == FILE_IMMEDIATE))
+         bld.mkMovToReg(s, i->getSrc(s));
+      else {
+         bld.mkMovToReg(s, ld->getSrc(0));
+         // Clear the src, to make code elimination possible here before we
+         // delete the instruction i later
+         i->setSrc(s, NULL);
+         if (ld->isDead())
+            delete_Instruction(prog, ld);
+      }
+   }
+
    switch (i->dType) {
    case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break;
    case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break;
-- 
2.14.0



More information about the Nouveau mailing list