[Mesa-dev] [PATCH] nir: Don't fuse fmul into ffma if used by more than 4 fadds.

Iago Toral itoral at igalia.com
Wed Sep 23 03:19:40 PDT 2015


On Tue, 2015-09-22 at 15:52 -0700, Matt Turner wrote:
> total instructions in shared programs: 6596689 -> 6595563 (-0.02%)
> instructions in affected programs:     103154 -> 102028 (-1.09%)
> helped:                                253
> HURT:                                  217
> 
> It's kind of a wash in terms of programs helped/hurt, but of the
> programs helped 169 are by more than 10%.

Yeah, that is a very significant gain for the helped cases. Out of
curiosity, how bad are hurt programs in comparison?

> ---
> I tried values of 2-6, and 4 seemed to be the best. I can provide
> full shader-db result files if other people want to investigate.

I was wondering if shader-db metrics are the best option for selecting
thresholds such as these, meaning that when you are replacing
instructions that have different latencies simply comparing instruction
counts may not be the best thing to do. However, if ffma operations are
more expensive that fadd, a higher instruction count should always lead
to worse performance, so that should be ok.

Reviewed-by: Iago Toral Quiroga <itoral at igalia.com>

>  src/glsl/nir/nir_opt_peephole_ffma.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/src/glsl/nir/nir_opt_peephole_ffma.c b/src/glsl/nir/nir_opt_peephole_ffma.c
> index 4f0f0da..3e8a34f 100644
> --- a/src/glsl/nir/nir_opt_peephole_ffma.c
> +++ b/src/glsl/nir/nir_opt_peephole_ffma.c
> @@ -39,7 +39,7 @@ struct peephole_ffma_state {
>  };
>  
>  static inline bool
> -are_all_uses_fadd(nir_ssa_def *def)
> +are_all_uses_fadd(nir_ssa_def *def, unsigned *num_uses)
>  {
>     if (!list_empty(&def->if_uses))
>        return false;
> @@ -53,6 +53,7 @@ are_all_uses_fadd(nir_ssa_def *def)
>        nir_alu_instr *use_alu = nir_instr_as_alu(use_instr);
>        switch (use_alu->op) {
>        case nir_op_fadd:
> +         (*num_uses)++;
>           break; /* This one's ok */
>  
>        case nir_op_imov:
> @@ -60,7 +61,7 @@ are_all_uses_fadd(nir_ssa_def *def)
>        case nir_op_fneg:
>        case nir_op_fabs:
>           assert(use_alu->dest.dest.is_ssa);
> -         if (!are_all_uses_fadd(&use_alu->dest.dest.ssa))
> +         if (!are_all_uses_fadd(&use_alu->dest.dest.ssa, num_uses))
>              return false;
>           break;
>  
> @@ -101,15 +102,18 @@ get_mul_for_src(nir_alu_src *src, int num_components,
>        *abs = true;
>        break;
>  
> -   case nir_op_fmul:
> -      /* Only absorb a fmul into a ffma if the fmul is is only used in fadd
> -       * operations.  This prevents us from being too aggressive with our
> -       * fusing which can actually lead to more instructions.
> +   case nir_op_fmul: {
> +      /* Only fuse an fmul into an ffma if its result is used by not more than
> +       * four fadd operations. This prevents us from too aggressively fusing
> +       * operations which can actually lead to more instructions or many ffma
> +       * operations performing the same multiply.
>         */
> -      if (!are_all_uses_fadd(&alu->dest.dest.ssa))
> +
> +      unsigned num_uses = 0;
> +      if (!are_all_uses_fadd(&alu->dest.dest.ssa, &num_uses) || num_uses > 4)
>           return NULL;
>        break;
> -
> +   }
>     default:
>        return NULL;
>     }



More information about the mesa-dev mailing list