[Mesa-dev] [PATCH] i965/nir: do int64 lowering before optimization

Mon Feb 5 13:48:16 UTC 2018

On Sun, 2018-02-04 at 14:40 -0500, Connor Abbott wrote:
> On Mon, Dec 11, 2017 at 11:01 AM, Jason Ekstrand <jason at jlekstrand.ne
> t> wrote:
> > On Mon, Dec 11, 2017 at 12:55 AM, Iago Toral <itoral at igalia.com>
> > wrote:
> > > 
> > > This didn't get any reviews yet. Any takers?
> > > 
> > > On Fri, 2017-12-01 at 13:46 +0100, Iago Toral Quiroga wrote:
> > > > Otherwise loop unrolling will fail to see the actual cost of
> > > > the unrolling operations when the loop body contains 64-bit
> > > > integer
> > > > instructions, and very specially when the divmod64 lowering
> > > > applies,
> > > > since its lowering is quite expensive.
> > > > 
> > > > Without this change, some in-development CTS tests for int64
> > > > get stuck forever trying to register allocate a shader with
> > > > over 50K SSA values. The large number of SSA values is the
> > > > result
> > > > of NIR first unrolling multiple seemingly simple loops that
> > > > involve
> > > > int64 instructions, only to then lower these instructions to
> > > > produce
> > > > a massive pile of code (due to the divmod64 lowering in the
> > > > unrolled
> > > > instructions).
> > > > 
> > > > With this change, loop unrolling will see the loops with the
> > > > int64
> > > > code already lowered and will realize that it is too expensive
> > > > to
> > > > unroll.
> > 
> > 
> > Hrm... I'm not quite sure what I think of this.  I put it after
> > nir_optimize
> > because I wanted opt_algebraic to be able to work it's magic and
> > hopefully
> > remove a bunch of int64 ops before we lower them.  In particular,
> > we have
> > optimizations to remove integer division and replace it with
> > shifts.
> > However, loop unrolling does need to happen before
> > lower_indirect_derefs so
> > that lower_indirect_derefs will do as little work as possible.
> > 
> > This is a bit of a pickle...  I don't really want to add a third
> > brw_nir_optimize call.  It probably wouldn't be the end of the
> > world but it
> > does add compile time.
> > 
> > One crazy idea which I don't think I like would be to have a quick
> > pass that
> > walks the IR and sees if there are any 64-bit SSA values.  If it
> > does, we
> > run brw_nir_optimize without loop unrolling then 64-bit lowering
> > and then we
> > go into the normal brw_nir_optimize.
> > 
> > --Jason
> 
> Why don't we just add some sort of backend-specific code-size metric
> to the loop unrolling, rather than just counting NIR instructions?
> i.e. something like a num_assembly_instructions(nir_instr *) function
> pointer in nir_shader_compiler_options. The root of the problem is
> that that different NIR instructions can turn into vastly different
> numbers of assembly instructions, but we really care about the
> latter,
> so the cutoff isn't doing its job of avoiding code-size blowup. As
> far
> as I'm aware, this is what most other compilers (e.g. LLVM) do to
> solve this problem.

Yeah, I think that makes sense and sounds like something we would like
to have at some point.

Iago

> > 
> > > 
> > > > ---
> > > >  src/intel/compiler/brw_nir.c | 8 ++++----
> > > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/src/intel/compiler/brw_nir.c
> > > > b/src/intel/compiler/brw_nir.c
> > > > index 8f3f77f89a..ef12cdfff8 100644
> > > > --- a/src/intel/compiler/brw_nir.c
> > > > +++ b/src/intel/compiler/brw_nir.c
> > > > @@ -636,6 +636,10 @@ brw_preprocess_nir(const struct
> > > > brw_compiler
> > > > *compiler, nir_shader *nir)
> > > > 
> > > >     OPT(nir_split_var_copies);
> > > > 
> > > > +   nir_lower_int64(nir, nir_lower_imul64 |
> > > > +                        nir_lower_isign64 |
> > > > +                        nir_lower_divmod64);
> > > > +
> > > >     nir = brw_nir_optimize(nir, compiler, is_scalar);
> > > > 
> > > >     if (is_scalar) {
> > > > @@ -663,10 +667,6 @@ brw_preprocess_nir(const struct
> > > > brw_compiler
> > > > *compiler, nir_shader *nir)
> > > >        brw_nir_no_indirect_mask(compiler, nir->info.stage);
> > > >     nir_lower_indirect_derefs(nir, indirect_mask);
> > > > 
> > > > -   nir_lower_int64(nir, nir_lower_imul64 |
> > > > -                        nir_lower_isign64 |
> > > > -                        nir_lower_divmod64);
> > > > -
> > > >     /* Get rid of split copies */
> > > >     nir = brw_nir_optimize(nir, compiler, is_scalar);
> > > > 
> > > 
> > > _______________________________________________
> > > mesa-dev mailing list
> > > mesa-dev at lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > 
> > 
> > 
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > 
> 
>