[Nouveau] [PATCH v2] nvc0/ir: propagate immediates to CALL input MOVs

Tobias Klausmann tobias.johannes.klausmann at mni.thm.de
Wed Aug 16 20:16:14 UTC 2017


ping on this v2

On 8/13/17 3:02 AM, Tobias Klausmann wrote:
> On using builtin functions we have to move the input to registers $0 and $1, if
> one of the input value is an immediate, we fail to propagate the immediate:
>
> ...
> mov u32 $r477 0x00000003 (0)
> ...
> mov u32 $r0 %r473 (0)
> mov u32 $r1 $r477 (0)
> call abs BUILTIN:0 (0)
> mov u32 %r495 $r1 (0)
> ...
>
> With this patch the immediate is propagated, potentially causing the first MOV
> to be superfluous, which we'd remove in that case:
>
> ...
>
> mov u32 $r0 %r473 (0)
> mov u32 $r1 0x00000003 (0)
> call abs BUILTIN:0 (0)
> mov u32 %r495 $r1 (0)
> ...
>
> Shaderdb stats:
> total instructions in shared programs : 4893460 -> 4893324 (-0.00%)
> total gprs used in shared programs    : 582972 -> 582881 (-0.02%)
> total local used in shared programs   : 17960 -> 17960 (0.00%)
>
>                 local        gpr       inst      bytes
>     helped           0          91         112         112
>       hurt           0           0           0           0
>
> v2:
>  implement some changes proposed by imirkin, the manual deletion of the dead
>  mov is necessary after ea22ac23e0 ("nvc0/ir: unlink values pre- and post-call
>  to division function") as the potentially dead mov is unlinked properly,
>  causing later passes to not notice the mov op at all and thus not cleaning it
>  up. That makes up a big chunk of the regression the above commit caused.
>  Keep the deletion of the op where it is, deleting it later unnecessarily blows
>  up size of the change.
>
> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
> ---
>  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp       | 21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> index c8f0701572..7243b1d2e4 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> @@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i)
>     int builtin;
>  
>     bld.setPosition(i, false);
> -   bld.mkMovToReg(0, i->getSrc(0));
> -   bld.mkMovToReg(1, i->getSrc(1));
> +
> +   // Generate movs to the input regs for the call we want to generate
> +   for (int s = 0; i->srcExists(s); ++s) {
> +      Instruction *ld = i->getSrc(s)->getInsn();
> +      assert(ld->getSrc(0) != NULL);
> +      // check if we are moving an immediate, propagate it in that case
> +      if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) ||
> +            !(ld->src(0).getFile() == FILE_IMMEDIATE))
> +         bld.mkMovToReg(s, i->getSrc(s));
> +      else {
> +         bld.mkMovToReg(s, ld->getSrc(0));
> +         // Clear the src, to make code elimination possible here before we
> +         // delete the instruction i later
> +         i->setSrc(s, NULL);
> +         if (ld->isDead())
> +            delete_Instruction(prog, ld);
> +      }
> +   }
> +
>     switch (i->dType) {
>     case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break;
>     case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break;


More information about the Nouveau mailing list