[Mesa-dev] [PATCH] nv50/ir: make sure to erase src2 after optimizing to MOV
Ilia Mirkin
imirkin at alum.mit.edu
Wed Nov 9 15:19:26 UTC 2016
On Wed, Nov 9, 2016 at 10:10 AM, Samuel Pitoiset
<samuel.pitoiset at gmail.com> wrote:
>
>
> On 11/09/2016 03:58 PM, Ilia Mirkin wrote:
>>
>> On Wed, Nov 9, 2016 at 9:20 AM, Samuel Pitoiset
>> <samuel.pitoiset at gmail.com> wrote:
>>>
>>> For all instructions with 3 sources (like OP_SLCT), src2 needs
>>> to be destroyed because srcExists(2) will return true although
>>> it's actually undefined.
>>>
>>> Spotted with my ADD3 series.
>>
>>
>> Sounds like the ADD3 series' addition to this function doesn't handle
>> this scenario. It's already properly handled by FMA/MAD, for example.
>
>
> Nope, it's correctly handled in expr() in my series.
>
> In ConstantFolding::visit(BasicBlock *bb), I added a new opnd2() method
> which is used for folding ADD3(a,b,c)-> ADD3(b,a+c) or ->ADD3(a,b+c) because
> the pass doesn't currently implement such a case.
>
> Though, opnd2() is called if srcExists(2) is true and if src0/src1 or
> src0/src2 are immediate values. But as this is done after the opnd3() call,
> if the instruction has been constant folded (just before) but src2 not
> erased, it will try to get the immediates and crash... which is expected.
>
> This just makes expr() more robust against that, and it's not entirely
> related to my series, because if you try to detect more scenarios where
> immediates can be folded *after* opnd3() you will hit the issue for sure.
>
> src1 is already erased in expr(), I don't see why src2 should not be as well
> (when optimizing to SAT/MOV).
src2 is also erased.
switch (i->op) {
case OP_MAD:
case OP_FMA: {
...
i->setSrc(2, NULL);
}
You need to add a case for ADD3 here that does the same thing. And
also immediate re-executes expr/opnd in case the other arg was *also*
an immed. That should avoid taking up an extra fold iteration on it.
>
>
>>
>>>
>>> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
>>> ---
>>> src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> index 9cf6ddc..ef31436 100644
>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> @@ -772,6 +772,7 @@ ConstantFolding::expr(Instruction *i,
>>> break;
>>> default:
>>> i->op = i->saturate ? OP_SAT : OP_MOV; /* SAT handled by unary()
>>> */
>>> + i->setSrc(2, NULL);
>>> break;
>>> }
>>> i->subOp = 0;
>>> --
>>> 2.10.2
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
> --
> -Samuel
More information about the mesa-dev
mailing list