[Mesa-dev] [PATCH 2/3] nir: combine fmul and fadd across ffma operations

Tue Nov 13 15:18:17 UTC 2018

The brw_nir_opt_peephole_ffma pass is only doing what the fuse_ffma 
option already does. It produces the same result as the fuse_ffma 
option, which is not optimal.

This is what I get:
    vec4 32 ssa_7 = fmul ssa_6, ssa_1.yyyy
    vec4 32 ssa_8 = ffma ssa_5, ssa_1.xxxx, ssa_7
    vec4 32 ssa_10 = ffma ssa_9, ssa_1.zzzz, ssa_8
    vec4 32 ssa_12 = fadd ssa_10, ssa_11
But better optimized as (example with the least rearrangements):
    vec4 32 ssa_7 = ffma ssa_6, ssa_1.yyyy, ssa_11
    vec4 32 ssa_8 = ffma ssa_5, ssa_1.xxxx, ssa_7
    vec4 32 ssa_10 = ffma ssa_9, ssa_1.zzzz, ssa_8

Fusing the fmul and fadd in this case is not obvious. Could this patch 
be OK if it is behind the fuse_ffma option?

On 11/12/2018 02:30 PM, Jason Ekstrand wrote:
> In general, you're not supposed to mess around with the precision of fma...
> What we do in the Intel drivers is to leave fma split, apply operations,
> and then we have a special mul+add fusion pass we run at the end.  Leaving
> them split allows for exactly this kind of optimization without mixing up
> those FMAs that are supposed to be kept fused and those generated by
> mul+add fusion which can be split back apart and re-optimized.
> 
> On Mon, Nov 12, 2018 at 12:17 PM Jonathan Marek <jonathan at marek.ca> wrote:
> 
>> This works by moving the fadd up across the ffma operations, so that it
>> can eventually can be combined with a fmul. I'm not sure it works in all
>> cases, but it works in all the common cases.
>>
>> Example:
>>      matrix * vec4(coord, 1.0)
>> is compiled as:
>>      fmul, ffma, ffma, fadd
>> and with this patch:
>>      ffma, ffma, ffma
>>
>> Signed-off-by: Jonathan Marek <jonathan at marek.ca>
>> ---
>>   src/compiler/nir/nir_opt_algebraic.py | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/src/compiler/nir/nir_opt_algebraic.py
>> b/src/compiler/nir/nir_opt_algebraic.py
>> index 8f4df891b8..82e10731a6 100644
>> --- a/src/compiler/nir/nir_opt_algebraic.py
>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>> @@ -133,6 +133,7 @@ optimizations = [
>>      (('~fadd at 64', a, ('fmul',         c , ('fadd', b, ('fneg', a)))),
>> ('flrp', a, b, c), '!options->lower_flrp64'),
>>      (('ffma', a, b, c), ('fadd', ('fmul', a, b), c),
>> 'options->lower_ffma'),
>>      (('~fadd', ('fmul', a, b), c), ('ffma', a, b, c),
>> 'options->fuse_ffma'),
>> +   (('~fadd', ('ffma', a, b, c), d), ('ffma', a, b, ('fadd', c, d))),
>>
>>      (('fdot4', ('vec4', a, b,   c,   1.0), d), ('fdph',  ('vec3', a, b,
>> c), d)),
>>      (('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)),
>> --
>> 2.17.1
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>