[Mesa-dev] [PATCH 7/9] nir/peephole_ffma: Be less agressive about fusing multiply-adds

Tue Mar 24 15:39:19 PDT 2015

I think this is being both too aggressive and not aggressive enough.
First, the too aggressive part. Imagine if we have something like:

a = mul b, c;
d = add a, 1.0;
e = add a, 2.0;
f = add a, 3.0;

In this case, we don't want to turn this into a series of 3 mad's
since the 3 load immediate instructions are going to hurt us more than
the elimination of the mul. When the only thing using the mul is a
bunch of add's, it's only profitable when there's at most one distinct
immediate that isn't used by other MAD's -- any more and we gain
instructions.

Also, I think that we might still want to fuse the multiply and the
add even if the mul is used somewhere else, since at least it gives
the scheduler some extra freedom to hide latency -- as long as we
don't have to emit any load immediate instructions. Of course, this is
all very backend-specific so at this point it would make sense to move
this to an i965 pass and leave in the general ffma peephole for
everyone else without these weird restrictions.

On Mon, Mar 23, 2015 at 11:13 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> shader-db results for fragment shaders on Haswell:
> total instructions in shared programs: 4395688 -> 4389623 (-0.14%)
> instructions in affected programs:     355876 -> 349811 (-1.70%)
> helped:                                1455
> HURT:                                  14
> GAINED:                                5
> LOST:                                  0
> ---
>  src/glsl/nir/nir_opt_peephole_ffma.c | 41 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
>
> diff --git a/src/glsl/nir/nir_opt_peephole_ffma.c b/src/glsl/nir/nir_opt_peephole_ffma.c
> index 4674505..b875bb6 100644
> --- a/src/glsl/nir/nir_opt_peephole_ffma.c
> +++ b/src/glsl/nir/nir_opt_peephole_ffma.c
> @@ -38,6 +38,41 @@ struct peephole_ffma_state {
>     bool progress;
>  };
>
> +static inline bool
> +are_all_uses_fadd(nir_ssa_def *def)
> +{
> +   if (def->if_uses->entries > 0)
> +      return false;
> +
> +   struct set_entry *use_iter;
> +   set_foreach(def->uses, use_iter) {
> +      nir_instr *use_instr = (nir_instr *)use_iter->key;
> +
> +      if (use_instr->type != nir_instr_type_alu)
> +         return false;
> +
> +      nir_alu_instr *use_alu = nir_instr_as_alu(use_instr);
> +      switch (use_alu->op) {
> +      case nir_op_fadd:
> +         break; /* This one's ok */
> +
> +      case nir_op_imov:
> +      case nir_op_fmov:
> +      case nir_op_fneg:
> +      case nir_op_fabs:
> +         assert(use_alu->dest.dest.is_ssa);
> +         if (!are_all_uses_fadd(&use_alu->dest.dest.ssa))
> +            return false;
> +         break;
> +
> +      default:
> +         return false;
> +      }
> +   }
> +
> +   return true;
> +}
> +
>  static inline nir_alu_instr *
>  get_mul_for_src(nir_alu_src *src, uint8_t swizzle[4], bool *negate, bool *abs)
>  {
> @@ -66,6 +101,12 @@ get_mul_for_src(nir_alu_src *src, uint8_t swizzle[4], bool *negate, bool *abs)
>        break;
>
>     case nir_op_fmul:
> +      /* Only absorbe a fmul into a ffma if the fmul is is only used in fadd
> +       * operations.  This prevents us from being too agressive with our
> +       * fusing which can actually lead to more instructions.
> +       */
> +      if (!are_all_uses_fadd(&alu->dest.dest.ssa))
> +         return NULL;
>        break;
>
>     default:
> --
> 2.3.3
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev