[Mesa-dev] [PATCH] nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)

Samuel Pitoiset samuel.pitoiset at gmail.com
Sun Oct 9 12:04:42 UTC 2016



On 10/08/2016 10:09 PM, Ilia Mirkin wrote:
> On Sat, Oct 8, 2016 at 3:55 PM, Samuel Pitoiset
> <samuel.pitoiset at gmail.com> wrote:
>> total instructions in shared programs :2286901 -> 2284473 (-0.11%)
>> total gprs used in shared programs    :335256 -> 335273 (0.01%)
>> total local used in shared programs   :31968 -> 31968 (0.00%)
>>
>>                 local        gpr       inst      bytes
>>     helped           0          41         852         852
>>       hurt           0          44          23          23
>>
>> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
>> ---
>>  .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 94 ++++++++++++++++++++++
>>  1 file changed, 94 insertions(+)
>>
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>> index 6efb29e..caf5d1d 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>> @@ -2132,6 +2132,99 @@ AlgebraicOpt::visit(BasicBlock *bb)
>>
>>  // =============================================================================
>>
>> +// ADD(SHL(a, b), c) -> SHLADD(a, b, c)
>> +class LateAlgebraicOpt : public Pass
>> +{
>> +private:
>> +   virtual bool visit(BasicBlock *);
>> +
>> +   void handleADD(Instruction *);
>> +   bool tryADDToSHLADD(Instruction *);
>> +
>> +   BuildUtil bld;
>> +};
>> +
>> +void
>> +LateAlgebraicOpt::handleADD(Instruction *add)
>> +{
>> +   Value *src0 = add->getSrc(0);
>> +   Value *src1 = add->getSrc(1);
>> +
>> +   if (src0->reg.file != FILE_GPR || src1->reg.file != FILE_GPR)
>> +      return;
>> +
>> +   if (prog->getTarget()->isOpSupported(OP_SHLADD, add->dType))
>> +      tryADDToSHLADD(add);
>> +}
>> +
>> +// ADD(SHL(a, b), c) -> SHLADD(a, b, c)
>> +bool
>> +LateAlgebraicOpt::tryADDToSHLADD(Instruction *add)
>> +{
>> +   Value *src0 = add->getSrc(0);
>> +   Value *src1 = add->getSrc(1);
>> +   Modifier mod[2];
>> +   Value *src;
>> +   int s;
>> +
>> +   if (add->saturate || add->usesFlags() || typeSizeof(add->dType) == 8)
>> +      return false;
>> +
>> +   if (src0->getUniqueInsn() && src0->getUniqueInsn()->op == OP_SHL)
>> +      s = 0;
>> +   else
>> +   if (src1->getUniqueInsn() && src1->getUniqueInsn()->op == OP_SHL)
>> +      s = 1;
>> +   else
>> +      return false;
>> +
>> +   src = add->getSrc(s);
>> +
>> +   if (src->getUniqueInsn()->bb != add->bb)
>> +      return false;
>
> You flip between UniqueInsn and Insn... I think UniqueInsn is fine,
> but you could also save the result off into another Instruction *shl
> and keep reusing that. IMHO it'll be easier to follow the logic.

Fine by me.

>
>> +
>> +   if (src->getInsn()->usesFlags() || src->getInsn()->subOp)
>> +      return false;
>> +
>> +   if (src->getInsn()->src(1).getFile() != FILE_IMMEDIATE)
>
> This is happening before LoadPropagation, so that may may not always
> work (e.g. it could be a mov). I think you want to use getImmediate()
> or however that function is called so that you can peer through moves.

Okay, I will have a look.

>
>> +      return false;
>> +
>> +   mod[0] = add->src(0).mod;
>> +   mod[1] = add->src(1).mod;
>> +
>> +   add->op = OP_SHLADD;
>> +   add->setSrc(2, add->src(s ? 0 : 1));
>> +   add->src(2).mod = mod[s];
>> +
>> +   add->setSrc(0, src->getInsn()->getSrc(0));
>> +   add->src(0).mod = mod[0];
>
> mod[!s] presumably?
>
> IMO this would be more readable with judicious use of swapSources and
> moveSources.

Okay, but I have just followed the same design as tryADDToMadOrSAD() but 
it seems like we can improve readability with those util methods.

>
>> +   add->setSrc(1, src->getInsn()->getSrc(1));
>> +   add->src(1).mod = Modifier(0);
>> +
>> +   return true;
>> +}
>> +
>> +bool
>> +LateAlgebraicOpt::visit(BasicBlock *bb)
>> +{
>> +   Instruction *next;
>> +
>> +   for (Instruction *i = bb->getEntry(); i; i = next) {
>> +      next = i->next;
>> +      switch (i->op) {
>> +      case OP_ADD:
>> +         handleADD(i);
>> +         break;
>> +      default:
>> +         break;
>> +      }
>> +   }
>> +
>> +   return true;
>> +}
>> +
>> +// =============================================================================
>> +
>>  static inline void
>>  updateLdStOffset(Instruction *ldst, int32_t offset, Function *fn)
>>  {
>> @@ -3436,6 +3529,7 @@ Program::optimizeSSA(int level)
>>     RUN_PASS(2, AlgebraicOpt, run);
>>     RUN_PASS(2, ModifierFolding, run); // before load propagation -> less checks
>>     RUN_PASS(1, ConstantFolding, foldAll);
>> +   RUN_PASS(2, LateAlgebraicOpt, run);
>>     RUN_PASS(1, LoadPropagation, run);
>>     RUN_PASS(1, IndirectPropagation, run);
>>     RUN_PASS(2, MemoryOpt, run);
>> --
>> 2.10.0
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list