[Mesa-dev] [PATCH 2/3] nir: combine fmul and fadd across ffma operations
Jonathan marek
jonathan at marek.ca
Tue Nov 13 15:18:17 UTC 2018
The brw_nir_opt_peephole_ffma pass is only doing what the fuse_ffma
option already does. It produces the same result as the fuse_ffma
option, which is not optimal.
This is what I get:
vec4 32 ssa_7 = fmul ssa_6, ssa_1.yyyy
vec4 32 ssa_8 = ffma ssa_5, ssa_1.xxxx, ssa_7
vec4 32 ssa_10 = ffma ssa_9, ssa_1.zzzz, ssa_8
vec4 32 ssa_12 = fadd ssa_10, ssa_11
But better optimized as (example with the least rearrangements):
vec4 32 ssa_7 = ffma ssa_6, ssa_1.yyyy, ssa_11
vec4 32 ssa_8 = ffma ssa_5, ssa_1.xxxx, ssa_7
vec4 32 ssa_10 = ffma ssa_9, ssa_1.zzzz, ssa_8
Fusing the fmul and fadd in this case is not obvious. Could this patch
be OK if it is behind the fuse_ffma option?
On 11/12/2018 02:30 PM, Jason Ekstrand wrote:
> In general, you're not supposed to mess around with the precision of fma...
> What we do in the Intel drivers is to leave fma split, apply operations,
> and then we have a special mul+add fusion pass we run at the end. Leaving
> them split allows for exactly this kind of optimization without mixing up
> those FMAs that are supposed to be kept fused and those generated by
> mul+add fusion which can be split back apart and re-optimized.
>
> On Mon, Nov 12, 2018 at 12:17 PM Jonathan Marek <jonathan at marek.ca> wrote:
>
>> This works by moving the fadd up across the ffma operations, so that it
>> can eventually can be combined with a fmul. I'm not sure it works in all
>> cases, but it works in all the common cases.
>>
>> Example:
>> matrix * vec4(coord, 1.0)
>> is compiled as:
>> fmul, ffma, ffma, fadd
>> and with this patch:
>> ffma, ffma, ffma
>>
>> Signed-off-by: Jonathan Marek <jonathan at marek.ca>
>> ---
>> src/compiler/nir/nir_opt_algebraic.py | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/src/compiler/nir/nir_opt_algebraic.py
>> b/src/compiler/nir/nir_opt_algebraic.py
>> index 8f4df891b8..82e10731a6 100644
>> --- a/src/compiler/nir/nir_opt_algebraic.py
>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>> @@ -133,6 +133,7 @@ optimizations = [
>> (('~fadd at 64', a, ('fmul', c , ('fadd', b, ('fneg', a)))),
>> ('flrp', a, b, c), '!options->lower_flrp64'),
>> (('ffma', a, b, c), ('fadd', ('fmul', a, b), c),
>> 'options->lower_ffma'),
>> (('~fadd', ('fmul', a, b), c), ('ffma', a, b, c),
>> 'options->fuse_ffma'),
>> + (('~fadd', ('ffma', a, b, c), d), ('ffma', a, b, ('fadd', c, d))),
>>
>> (('fdot4', ('vec4', a, b, c, 1.0), d), ('fdph', ('vec3', a, b,
>> c), d)),
>> (('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)),
>> --
>> 2.17.1
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
More information about the mesa-dev
mailing list