[Mesa-dev] [PATCH] nv50/ir: swap the least-ref'd source into src1 when both const/imm

Ilia Mirkin imirkin at alum.mit.edu
Mon Jan 18 02:21:57 PST 2016


On Mon, Jan 18, 2016 at 4:14 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> The whole point of inlining sources is to reduce loads. We can end up in
> a situation where one value is used a lot of times, and one value is
> used only once per instruction. The once-per-instruction one is the one
> that should get inlined, but with the previous algorithm, it was given
> no preference.
>
> This flips things around to preferring putting less-referenced values
> into src1 which increases the likelihood of them being inlined.
>
> While we're at it, adjust the heuristic to not treat 0 as an immediate,
> as well as (effectively) check for situations where LIMMs can't be
> loaded. All this yields improvements on nvc0:
>
> total instructions in shared programs : 6261157 -> 6255985 (-0.08%)
> total gprs used in shared programs    : 945082 -> 943417 (-0.18%)
> total local used in shared programs   : 30372 -> 30288 (-0.28%)
> total bytes used in shared programs   : 50089256 -> 50047880 (-0.08%)
>
>                 local        gpr       inst      bytes
>     helped          21         822        3332        3332
>       hurt           0         278         565         565
>
> And more importantly avoids generating really bad code with SSBOs, where
> we end up checking a lot of different values (usually immediates) against
> the length.
>
> On nv50 the effect is less pronounced, although it does seem to, on
> average, pack the instructions better (instructions went up but bytes
> went down):
>
> total instructions in shared programs : 6346564 -> 6347676 (0.02%)
> total gprs used in shared programs    : 728719 -> 725883 (-0.39%)
> total local used in shared programs   : 3552 -> 3552 (0.00%)
> total bytes used in shared programs   : 43995688 -> 43977840 (-0.04%)
>
>                 local        gpr       inst      bytes
>     helped           0        1326        3111        3677
>       hurt           0         868        3572        3108
>
> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
> ---
>  .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 26 +++++++++++++---------
>  1 file changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> index f5c590e..4d82938 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> @@ -171,7 +171,10 @@ LoadPropagation::isImmdLoad(Instruction *ld)
>     if (!ld || (ld->op != OP_MOV) ||
>         ((typeSizeof(ld->dType) != 4) && (typeSizeof(ld->dType) != 8)))
>        return false;
> -   return ld->src(0).getFile() == FILE_IMMEDIATE;
> +
> +   // A 0 can be replaced with a register, so it doesn't count as an immediate.
> +   ImmediateValue val;
> +   return ld->src(0).getImmediate(val) && !val.isInteger(0);
>  }
>
>  bool
> @@ -187,7 +190,8 @@ LoadPropagation::isAttribOrSharedLoad(Instruction *ld)
>  void
>  LoadPropagation::checkSwapSrc01(Instruction *insn)
>  {
> -   if (!prog->getTarget()->getOpInfo(insn).commutative)
> +   const Target *targ = prog->getTarget();
> +   if (!targ->getOpInfo(insn).commutative)
>        if (insn->op != OP_SET && insn->op != OP_SLCT)
>           return;
>     if (insn->src(1).getFile() != FILE_GPR)
> @@ -196,14 +200,16 @@ LoadPropagation::checkSwapSrc01(Instruction *insn)
>     Instruction *i0 = insn->getSrc(0)->getInsn();
>     Instruction *i1 = insn->getSrc(1)->getInsn();
>
> -   if (isCSpaceLoad(i0)) {
> -      if (!isCSpaceLoad(i1))
> -         insn->swapSources(0, 1);
> -      else
> -         return;
> -   } else
> -   if (isImmdLoad(i0)) {
> -      if (!isCSpaceLoad(i1) && !isImmdLoad(i1))
> +   // Swap sources to inline the less frequently used source. That way,
> +   // optimistically, it will eventually be able to remove the instruction.
> +   int i0refs = insn->getSrc(0)->refCount();
> +   int i1refs = insn->getSrc(1)->refCount();
> +
> +   if (isCSpaceLoad(i0) || isImmdLoad(i0)) {
> +      if (targ->insnCanLoad(insn, 1, i0) && (

Changing this to be

if ((isCSpace || isImmd) && insnCanLoad)) {

gets much better nv50 improvements (and no nvc0 changes) since it can
then try to do the attrib swap (which will never lead to anything good
on nvc0). I won't re-send a v2 though.

> +                (!isImmdLoad(i1) && !isCSpaceLoad(i1)) ||
> +                !targ->insnCanLoad(insn, 1, i1) ||
> +                i0refs < i1refs))
>           insn->swapSources(0, 1);
>        else
>           return;
> --
> 2.4.10
>


More information about the mesa-dev mailing list