[Mesa-dev] [PATCH] nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)
Karol Herbst
karolherbst at gmail.com
Sun Oct 9 19:44:57 UTC 2016
2016-10-09 21:34 GMT+02:00 Ilia Mirkin <imirkin at alum.mit.edu>:
> On Sun, Oct 9, 2016 at 3:28 PM, Karol Herbst <karolherbst at gmail.com> wrote:
>> 2016-10-09 13:58 GMT+02:00 Samuel Pitoiset <samuel.pitoiset at gmail.com>:
>>>
>>>
>>> On 10/08/2016 10:04 PM, Karol Herbst wrote:
>>>>
>>>> looks great, a few comments below
>>>
>>>
>>> Thanks!
>>>
>>>>
>>>> 2016-10-08 21:55 GMT+02:00 Samuel Pitoiset <samuel.pitoiset at gmail.com>:
>>>>>
>>>>> total instructions in shared programs :2286901 -> 2284473 (-0.11%)
>>>>> total gprs used in shared programs :335256 -> 335273 (0.01%)
>>>>> total local used in shared programs :31968 -> 31968 (0.00%)
>>>>>
>>>>> local gpr inst bytes
>>>>> helped 0 41 852 852
>>>>> hurt 0 44 23 23
>>>>>
>>>>> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
>>>>> ---
>>>>> .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 94
>>>>> ++++++++++++++++++++++
>>>>> 1 file changed, 94 insertions(+)
>>>>>
>>>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>>>> index 6efb29e..caf5d1d 100644
>>>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>>>> @@ -2132,6 +2132,99 @@ AlgebraicOpt::visit(BasicBlock *bb)
>>>>>
>>>>> //
>>>>> =============================================================================
>>>>>
>>>>> +// ADD(SHL(a, b), c) -> SHLADD(a, b, c)
>>>>> +class LateAlgebraicOpt : public Pass
>>>>> +{
>>>>> +private:
>>>>> + virtual bool visit(BasicBlock *);
>>>>> +
>>>>> + void handleADD(Instruction *);
>>>>> + bool tryADDToSHLADD(Instruction *);
>>>>> +
>>>>> + BuildUtil bld;
>>>>> +};
>>>>> +
>>>>> +void
>>>>> +LateAlgebraicOpt::handleADD(Instruction *add)
>>>>> +{
>>>>> + Value *src0 = add->getSrc(0);
>>>>> + Value *src1 = add->getSrc(1);
>>>>> +
>>>>> + if (src0->reg.file != FILE_GPR || src1->reg.file != FILE_GPR)
>>>>> + return;
>>>>> +
>>>>> + if (prog->getTarget()->isOpSupported(OP_SHLADD, add->dType))
>>>>> + tryADDToSHLADD(add);
>>>>> +}
>>>>> +
>>>>> +// ADD(SHL(a, b), c) -> SHLADD(a, b, c)
>>>>> +bool
>>>>> +LateAlgebraicOpt::tryADDToSHLADD(Instruction *add)
>>>>> +{
>>>>> + Value *src0 = add->getSrc(0);
>>>>> + Value *src1 = add->getSrc(1);
>>>>> + Modifier mod[2];
>>>>> + Value *src;
>>>>> + int s;
>>>>> +
>>>>> + if (add->saturate || add->usesFlags() || typeSizeof(add->dType) == 8)
>>>>> + return false;
>>>>> +
>>>>> + if (src0->getUniqueInsn() && src0->getUniqueInsn()->op == OP_SHL)
>>>>> + s = 0;
>>>>> + else
>>>>> + if (src1->getUniqueInsn() && src1->getUniqueInsn()->op == OP_SHL)
>>>>> + s = 1;
>>>>> + else
>>>>> + return false;
>>>>> +
>>>>> + src = add->getSrc(s);
>>>>> +
>>>>> + if (src->getUniqueInsn()->bb != add->bb)
>>>>> + return false;
>>>>
>>>>
>>>> is there a reason to check for the bb here?
>>>
>>>
>>> Those optimizations like ADD->MAD, ADD->SAD, etc, have to be local (ie.
>>> inside the same basic block). I just apply the same rule here.
>>>
>>>>
>>>>> +
>>>>> + if (src->getInsn()->usesFlags() || src->getInsn()->subOp)
>>>>> + return false;
>>>>> +
>>>>> + if (src->getInsn()->src(1).getFile() != FILE_IMMEDIATE)
>>>>> + return false;
>>>>> +
>>>>> + mod[0] = add->src(0).mod;
>>>>> + mod[1] = add->src(1).mod;
>>>>> +
>>>>> + add->op = OP_SHLADD;
>>>>> + add->setSrc(2, add->src(s ? 0 : 1));
>>>>
>>>>
>>>> I usually use "s ^ 1" instead.
>>>
>>>
>>> Yeah, but 's ? 0 : 1' seems to be more used than 's ^ 1' (in peephole).
>>>
>>>>
>>>>> + add->src(2).mod = mod[s];
>>>>> +
>>>>> + add->setSrc(0, src->getInsn()->getSrc(0));
>>>>> + add->src(0).mod = mod[0];
>>>>> + add->setSrc(1, src->getInsn()->getSrc(1));
>>>>> + add->src(1).mod = Modifier(0);
>>>>> +
>>>>> + return true;
>>>>> +}
>>>>> +
>>>>> +bool
>>>>> +LateAlgebraicOpt::visit(BasicBlock *bb)
>>>>> +{
>>>>> + Instruction *next;
>>>>> +
>>>>> + for (Instruction *i = bb->getEntry(); i; i = next) {
>>>>> + next = i->next;
>>>>> + switch (i->op) {
>>>>> + case OP_ADD:
>>>>> + handleADD(i);
>>>>> + break;
>>>>> + default:
>>>>> + break;
>>>>> + }
>>>>> + }
>>>>
>>>>
>>>> you can also just implement visit(Instruction *) and return true; then
>>>> you won't need that silly loop.
>>>
>>>
>>> True.
>>>
>>>>
>>>>> +
>>>>> + return true;
>>>>> +}
>>>>> +
>>>>> +//
>>>>> =============================================================================
>>>>> +
>>>>> static inline void
>>>>> updateLdStOffset(Instruction *ldst, int32_t offset, Function *fn)
>>>>> {
>>>>> @@ -3436,6 +3529,7 @@ Program::optimizeSSA(int level)
>>>>> RUN_PASS(2, AlgebraicOpt, run);
>>>>> RUN_PASS(2, ModifierFolding, run); // before load propagation -> less
>>>>> checks
>>>>> RUN_PASS(1, ConstantFolding, foldAll);
>>>>> + RUN_PASS(2, LateAlgebraicOpt, run);
>>>>
>>>>
>>>> What is the reason to add a new AlgebraicOpt pass for this?
>>>
>>>
>>> Because a bunch of optimizations are applied in ConstantFolding which means
>>> this one has to be done *after*. My first attempt was to do it in
>>> AlgebraicOpt, but it's actually not the best place. :-)
>>>
>>
>> Yeah but this applies to many other opts as well. I think in the end
>> we need to run those passes inside a loop and iterate like 2 or 3
>> times like I do here:
>> https://github.com/karolherbst/mesa/commit/e2ae247aa27ddb150ff4ccd1c0f19baa470134ef
>>
>> The first line is cut of, but the benefit is around 0.5% less
>> instructions and a decent speed up in pixmark_piano.
>
> This is different. The loop helps achieve a fixed point. However this
> pass does not enable any further optimizations from taking place, so
> it can easily be placed outside such a fixed-point loop. The general
> reason why we don't have a loop at all is that the optimizations are
> carefully ordered s.t. such a loop only provides minimal improvement.
>
> -ilia
right, there is still a pending fix for that on my side:
https://github.com/karolherbst/mesa/commit/2cc9062e886175246d0f93041ea8e953928ba069
but there are other issues as well
More information about the mesa-dev
mailing list