[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
Tobias Klausmann
tobias.johannes.klausmann at mni.thm.de
Sun Oct 2 18:43:03 UTC 2016
On 02.10.2016 20:26, Ilia Mirkin wrote:
> That's very odd. LoadPropagation should have picked that up even in
> its current form. Should try to figure out why it didn't and that is
> likely to "fix" a *lot* more situations.
Actually i was coming from an, given really constrained, addition to the
LoadPropagation pass, where i was told to fix it within OP_MAD :/
> On Sun, Oct 2, 2016 at 2:24 PM, Tobias Klausmann
> <tobias.johannes.klausmann at mni.thm.de> wrote:
>>
>> On 02.10.2016 20:03, Ilia Mirkin wrote:
>>> On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann
>>> <tobias.johannes.klausmann at mni.thm.de> wrote:
>>>> Previously we'd end up with an unnecessary mov for the thirs immediate
>>>> value.
>>>>
>>>> total instructions in shared programs : 851881 -> 851864 (-0.00%)
>>>> total gprs used in shared programs : 110295 -> 110295 (0.00%)
>>>> total local used in shared programs : 1020 -> 1020 (0.00%)
>>>>
>>>> local gpr inst bytes
>>>> helped 0 0 17 17
>>>> hurt 0 0 0 0
>>>>
>>>> Suggested-by: Karol Herbst <nouveau at karolherbst.de>
>>>> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
>>>> ---
>>>> src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15
>>>> ++++++++++++---
>>>> 1 file changed, 12 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>>> index 9875738..8bb5cf9 100644
>>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>>> @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i,
>>>> ImmediateValue &imm0, int s)
>>>> break;
>>>> case OP_MAD:
>>>> if (imm0.isInteger(0)) {
>>>> + ImmediateValue imm1;
>>>> i->setSrc(0, i->getSrc(2));
>>>> i->src(0).mod = i->src(2).mod;
>>>> i->setSrc(1, NULL);
>>>> i->setSrc(2, NULL);
>>>> - i->op = i->src(0).mod.getOp();
>>>> - if (i->op != OP_CVT)
>>>> - i->src(0).mod = 0;
>>>> + if (i->src(0).getImmediate(imm1)) {
>>>> + bld.setPosition(i, false);
>>>> + newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64),
>>>> + i->dType);
>>>> + delete_Instruction(prog, i);
>>> What's an example of a situation where this helps? It shouldn't
>>> matter, the mov's should get cleaned up. [Clearly 17 shaders
>>> disagree...] Is this just a side-effect of the fact that we don't run
>>> the opts to a fixed point?
>>
>> It is a second mov that causes a problem for later folding in the imm, here
>> output of a testshader[1]:
>>
>> 0: nop u32 %r56 (0)
>> 1: ld u32 %r31 c0[0x0] (0)
>> 2: ld u32 %r37 c0[0x140] (0)
>> 3: mov u32 %r38 0x00000000 (0)
>> 4: mov u32 %r39 0x3f800000 (0)
>> 5: mad f32 %r40 %r37 %r38 %r39 (0)
>> 6: mad f32 %r44 %r37 %r38 %r38 (0)
>> 7: add f32 %r53 %r31 %r40 (0)
>> 8: add f32 %r54 %r31 %r44 (0)
>> 9: add f32 %r57 %r56 %r44 (0)
>>
>> Constantfolding...
>>
>> MAIN:-1 ()
>> BB:0 (14 instructions) - df = { }
>> -> BB:1 (tree)
>> 0: nop u32 %r56 (0)
>> 1: ld u32 %r31 c0[0x0] (0)
>> 2: ld u32 %r37 c0[0x140] (0)
>> 3: mov u32 %r38 0x00000000 (0)
>> 4: mov u32 %r39 0x3f800000 (0)
>> 5: mov f32 %r40 %r39 (0)
>> 6: mov f32 %r44 %r38 (0)
>> 7: add f32 %r53 %r31 %r40 (0)
>> 8: mov f32 %r54 %r31 (0)
>> 9: mov f32 %r57 %r56 (0)
>>
>>
>> The outcome:
>> 0: ld u32 $r2 c0[0x0] (8)
>> 1: mov u32 $r0 0x3f800000 (8)
>> 2: add ftz f32 $r0 $r2 $r0 (8)
>> 3: mov f32 $r3 $r1 (8)
>> 4: mov u32 $r1 $r2 (8)
>> 5: export b128 # o[0x0] $r0q (8)
>>
>> With patch:
>> 0: ld u32 $r2 c0[0x0] (8)
>> 1: add ftz f32 $r0 $r2 1.000000 (8)
>> 2: mov f32 $r3 $r1 (8)
>> 3: mov u32 $r1 $r2 (8)
>> 4: export b128 # o[0x0] $r0q (8)
>>
>>
>> [1]:
>> VERT
>> PROPERTY NEXT_SHADER FRAG
>> DCL OUT[0], GENERIC[0]
>> DCL CONST[0]
>> DCL TEMP[0..1], LOCAL
>> IMM[0] FLT32 { 0.0078, -1.0000, 0.0000, 0.5000}
>> IMM[1] FLT32 { 1.0000, 0.0000, 65535.0000, 0.0100}
>> 0: MOV TEMP[0].xyz, CONST[0].xxxx
>> 39: MAD TEMP[1], CONST[20].xxxx, IMM[1].yyyy, IMM[1].xyyy
>> 41: ADD TEMP[1], TEMP[0], TEMP[1]
>> 208: MOV OUT[0], TEMP[1]
>> 211: END
>>
>>
>>
>>
>>>> + }
>>>> + else {
>>>> + i->op = i->src(0).mod.getOp();
>>>> + if (i->op != OP_CVT)
>>>> + i->src(0).mod = 0;
>>>> + }
>>>> } else
>>>> if (i->subOp != NV50_IR_SUBOP_MUL_HIGH &&
>>>> (imm0.isInteger(1) || imm0.isInteger(-1))) {
>>>> --
>>>> 2.10.0
>>>>
>>>> _______________________________________________
>>>> Nouveau mailing list
>>>> Nouveau at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/nouveau
>>
More information about the Nouveau
mailing list