[Mesa-dev] [PATCH 5/5] nir: Propagate negates up multiplication chains.

Tue Mar 1 22:56:21 UTC 2016

On Mon, Feb 22, 2016 at 10:14 PM, Ian Romanick <idr at freedesktop.org> wrote:
> On 02/22/2016 04:13 PM, Matt Turner wrote:
>> total instructions in shared programs: 7127270 -> 7103195 (-0.34%)
>> instructions in affected programs: 1376832 -> 1352757 (-1.75%)
>> helped: 7394
>> HURT: 622
>>
>> GAINED: 4
>> LOST:   2
>> ---
>>  src/compiler/nir/nir_opt_algebraic.py | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py
>> index cc2c229..1863033 100644
>> --- a/src/compiler/nir/nir_opt_algebraic.py
>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>> @@ -227,6 +227,10 @@ optimizations = [
>>     (('fabs', ('fsub', 0.0, a)), ('fabs', a)),
>>     (('iabs', ('isub', 0, a)), ('iabs', a)),
>>
>> +   # Propagate negation up multiplication chains
>> +   (('fmul', ('fneg', a), b), ('fneg', ('fmul', a, b))),
>> +   (('imul', ('ineg', a), b), ('ineg', ('fmul', a, b))),
>> +
>
> We had tried things like this before, and it wasn't clearly a win.  Is
> it a win now because of your early change to try to match a larger
> expression?  My recollection is that doing this thwarted some attempts
> at generating MAD.

I think that was before NIR, and I committed some patches like

commit 3d581f99963dea7e93a2f8fd819410da02c1cb7f
Author: Matt Turner <mattst88 at gmail.com>
Date:   Fri Dec 19 21:35:56 2014 -0800

    i965/vec4: Emit MADs from (x + -(y * z)).

commit c4fab711ed5bbdb6b8421a1b980215032fc795b8
Author: Matt Turner <mattst88 at gmail.com>
Date:   Fri Dec 19 21:30:16 2014 -0800

    i965/fs: Emit MADs from (x + -(y * z)).

to solve it. I believe NIR recognizes this case already (see
brw_nir_opt_peephole_ffma.c in i965/)

> Should we also add rules like
>
>    (('ffma', ('fneg', a), b, c), ('fadd', ('fneg', ('fmul', a, b)), c),
> 'options->lower_ffma'),

We actually split all ffma operations, optimize, and only as a last
step recombine multiplies and adds into ffma, so I don't think this
can change anything.

I tested it, and it gave no changes in shader-db.

> Finally, have you looked at the hurt cases?  I also seem to recall that
> cases where one of the operands of the fmul was an immediate were
> usually better off moving the negation from the

I spot checked some, and they seem to match what I remember from when
I tried this before. For instance:

-mul.sat(8)      g18<1>F         g15<8,8,1>F     g33<8,8,1>F
-mul.sat(8)      g22<1>F         -g15<8,8,1>F    g33<8,8,1>F
-mul.sat(8)      g20<1>F         g17<8,8,1>F     g33<8,8,1>F
-mul.sat(8)      g24<1>F         -g17<8,8,1>F    g33<8,8,1>F
+mul(8)          g18<1>F         g15<8,8,1>F     g35<8,8,1>F
+mul(8)          g21<1>F         g17<8,8,1>F     g35<8,8,1>F
+mov.sat(8)      g19<1>F         g18<8,8,1>F
+mov.sat(8)      g24<1>F         -g18<8,8,1>F
+mov.sat(8)      g22<1>F         g21<8,8,1>F
+mov.sat(8)      g26<1>F         -g21<8,8,1>F

You'd rather have the 4x mul.sats, but that means finding a way of
preventing CSE from combining them which I think requires more
information than contained in the multiply instructions themselves.

Solving this would be good though -- Connor encountered a similar
problem (or maybe the same) with his NIR vectorizer series.

I've pushed the first four patches. I'm not sure if we should hold off
on this one or not. Solving the previously mentioned problem will be a
lot simpler with UD chains... hopefully soon.

Preferences?