[Mesa-dev] [RFC] nir/opt_peephole_ffma: Bypass fusion if any operand of fadd and fmul is a const

Eduardo Lima Mitev elima at igalia.com
Mon Sep 21 00:14:33 PDT 2015


On 09/19/2015 09:00 PM, Jason Ekstrand wrote:
> 
> On Sep 18, 2015 2:49 AM, "Eduardo Lima Mitev" <elima at igalia.com
> <mailto:elima at igalia.com>> wrote:
>>
>> When both fadd and fmul instructions have at least one operand that is a
>> constant and it is only used once, the total number of instructions can
>> be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because
>> the constants will be progagated as immediate operands of fmul and fadd.
> 
> Right... I'm pretty sure that I've written this patch once before.  At
> the time we didn't do this because we thought Matt's constant combining
> would take care of most of those issues and we could just emit the MAD. 
> However, that pass doesn't always combine things optimally and doesn't
> exist for vec4.
> 
>> This patch modifies opt_peephole_ffma pass to detect this situation and
>> bails-out fusing fmul+fadd into ffma.
>>
>> As shown in shader-db results below, it seems to help a good bunch.
> However,
>> there are some caveats:
>>
>> * It seems i965 specific, so I'm not sure if modifying the NIR pass
>> directly is desired, as opposed to moving this to the backend.
>>
>> * There are still a high number of HURTs, but these could be reduced
> by being
>> more specific in the conditions to bailout.
>>
>> total instructions in shared programs: 1683959 -> 1677447 (-0.39%)
>> instructions in affected programs:     604918 -> 598406 (-1.08%)
>> helped:                                4633
>> HURT:                                  804
>> GAINED:                                0
>> LOST:                                  0
> 
> What does that look like if you split that between vec4 and fs?
> 

The above is shader-db results filtered for VS only. Following is the
same results filtered for FS:

total instructions in shared programs: 4593761 -> 4590252 (-0.08%)
instructions in affected programs:     538690 -> 535181 (-0.65%)
helped:                                2981
HURT:                                  38
GAINED:                                4
LOST:                                  0

And this is the raw, unfiltered report:

total instructions in shared programs: 6277603 -> 6267582 (-0.16%)
instructions in affected programs:     1143656 -> 1133635 (-0.88%)
helped:                                7614
HURT:                                  842
GAINED:                                4
LOST:                                  0

For FS, the gain is more noticeable because vector instructions tend to
neutralize the benefits (constants cannot be propagated to
multi-component operands).

Eduardo

>> ---
>>  src/glsl/nir/nir_opt_peephole_ffma.c | 31 +++++++++++++++++++++++++++++++
>>  1 file changed, 31 insertions(+)
>>
>> diff --git a/src/glsl/nir/nir_opt_peephole_ffma.c
> b/src/glsl/nir/nir_opt_peephole_ffma.c
>> index 4f0f0da..da47f8f 100644
>> --- a/src/glsl/nir/nir_opt_peephole_ffma.c
>> +++ b/src/glsl/nir/nir_opt_peephole_ffma.c
>> @@ -133,6 +133,28 @@ get_mul_for_src(nir_alu_src *src, int num_components,
>>     return alu;
>>  }
>>
>> +/**
>> + * Given a list of (at least two) nir_alu_src's, tells if any of them
> is a
>> + * constant value and is used only once.
>> + */
>> +static bool
>> +any_alu_src_is_a_constant(nir_alu_src srcs[])
>> +{
>> +   for (unsigned i = 0; i < 2; i++) {
>> +      if (srcs[i].src.ssa->parent_instr->type ==
> nir_instr_type_load_const) {
>> +         nir_load_const_instr *load_const =
>> +            nir_instr_as_load_const (srcs[i].src.ssa->parent_instr);
>> +
>> +         if (list_length(&load_const->def.uses) == 1 &&
>> +             list_length(&load_const->def.if_uses) == 0) {
>> +            return true;
>> +         }
>> +      }
>> +   }
>> +
>> +   return false;
>> +}
>> +
>>  static bool
>>  nir_opt_peephole_ffma_block(nir_block *block, void *void_state)
>>  {
>> @@ -183,6 +205,15 @@ nir_opt_peephole_ffma_block(nir_block *block,
> void *void_state)
>>        mul_src[0] = mul->src[0].src.ssa;
>>        mul_src[1] = mul->src[1].src.ssa;
>>
>> +      /* If any of the operands of the fmul and any of the fadd is a
> constant,
>> +       * we bypass because it will be more efficient as the constants
> will be
>> +       * propagated as operands, potentially saving two load_const
> instructions.
>> +       */
>> +      if (any_alu_src_is_a_constant(mul->src) &&
>> +          any_alu_src_is_a_constant(add->src)) {
>> +         continue;
>> +      }
>> +
>>        if (abs) {
>>           for (unsigned i = 0; i < 2; i++) {
>>              nir_alu_instr *abs = nir_alu_instr_create(state->mem_ctx,
>> --
>> 2.4.6
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org <mailto:mesa-dev at lists.freedesktop.org>
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 



More information about the mesa-dev mailing list