[Mesa-dev] [PATCH] i965/nir: do int64 lowering before optimization

Thu Dec 14 07:21:04 UTC 2017

On Tue, 2017-12-12 at 08:20 +0100, Iago Toral wrote:
> On Mon, 2017-12-11 at 08:01 -0800, Jason Ekstrand wrote:
> > On Mon, Dec 11, 2017 at 12:55 AM, Iago Toral <itoral at igalia.com>
> > wrote:
> > > This didn't get any reviews yet. Any takers?
> > > 
> > > 
> > > 
> > > On Fri, 2017-12-01 at 13:46 +0100, Iago Toral Quiroga wrote:
> > > 
> > > > Otherwise loop unrolling will fail to see the actual cost of
> > > 
> > > > the unrolling operations when the loop body contains 64-bit
> > > integer
> > > 
> > > > instructions, and very specially when the divmod64 lowering
> > > applies,
> > > 
> > > > since its lowering is quite expensive.
> > > 
> > > >
> > > 
> > > > Without this change, some in-development CTS tests for int64
> > > 
> > > > get stuck forever trying to register allocate a shader with
> > > 
> > > > over 50K SSA values. The large number of SSA values is the
> > > result
> > > 
> > > > of NIR first unrolling multiple seemingly simple loops that
> > > involve
> > > 
> > > > int64 instructions, only to then lower these instructions to
> > > produce
> > > 
> > > > a massive pile of code (due to the divmod64 lowering in the
> > > unrolled
> > > 
> > > > instructions).
> > > 
> > > >
> > > 
> > > > With this change, loop unrolling will see the loops with the
> > > int64
> > > 
> > > > code already lowered and will realize that it is too expensive
> > > to
> > > 
> > > > unroll.
> > > 
> > 
> > Hrm... I'm not quite sure what I think of this.  I put it after
> > nir_optimize because I wanted opt_algebraic to be able to work it's
> > magic and hopefully remove a bunch of int64 ops before we lower
> > them.  In particular, we have optimizations to remove integer
> > division and replace it with shifts.  However, loop unrolling does
> > need to happen before lower_indirect_derefs so that
> > lower_indirect_derefs will do as little work as possible.
> > 
> > This is a bit of a pickle...  I don't really want to add a third
> > brw_nir_optimize call.  It probably wouldn't be the end of the
> > world but it does add compile time.
> > 
> > One crazy idea which I don't think I like would be to have a quick
> > pass that walks the IR and sees if there are any 64-bit SSA
> > values.  If it does, we run brw_nir_optimize without loop unrolling
> > then 64-bit lowering and then we go into the normal
> > brw_nir_optimize.
> 
> With the constraints you mention above, I am not sure that we have
> many more options... what if we always run opt_algebraic first
> followed by int64 lowering before the first nir_optimize? That would
> only add an extra opt_algebraic instead of a full nir_optimize. Would
> that be better than adding that 64-bit SSA scan pre-pass?

We still need to make a decision for this, does my proposal sound
better than than the other options on the table? If not I guess we
should go with the 64-bit SSA scan pre-pass.
Iago
> >  
> > > > ---
> > > 
> > > >  src/intel/compiler/brw_nir.c | 8 ++++----
> > > 
> > > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > >
> > > 
> > > > diff --git a/src/intel/compiler/brw_nir.c
> > > 
> > > > b/src/intel/compiler/brw_nir.c
> > > 
> > > > index 8f3f77f89a..ef12cdfff8 100644
> > > 
> > > > --- a/src/intel/compiler/brw_nir.c
> > > 
> > > > +++ b/src/intel/compiler/brw_nir.c
> > > 
> > > > @@ -636,6 +636,10 @@ brw_preprocess_nir(const struct
> > > brw_compiler
> > > 
> > > > *compiler, nir_shader *nir)
> > > 
> > > >  
> > > 
> > > >     OPT(nir_split_var_copies);
> > > 
> > > >  
> > > 
> > > > +   nir_lower_int64(nir, nir_lower_imul64 |
> > > 
> > > > +                        nir_lower_isign64 |
> > > 
> > > > +                        nir_lower_divmod64);
> > > 
> > > > +
> > > 
> > > >     nir = brw_nir_optimize(nir, compiler, is_scalar);
> > > 
> > > >  
> > > 
> > > >     if (is_scalar) {
> > > 
> > > > @@ -663,10 +667,6 @@ brw_preprocess_nir(const struct
> > > brw_compiler
> > > 
> > > > *compiler, nir_shader *nir)
> > > 
> > > >        brw_nir_no_indirect_mask(compiler, nir->info.stage);
> > > 
> > > >     nir_lower_indirect_derefs(nir, indirect_mask);
> > > 
> > > >  
> > > 
> > > > -   nir_lower_int64(nir, nir_lower_imul64 |
> > > 
> > > > -                        nir_lower_isign64 |
> > > 
> > > > -                        nir_lower_divmod64);
> > > 
> > > > -
> > > 
> > > >     /* Get rid of split copies */
> > > 
> > > >     nir = brw_nir_optimize(nir, compiler, is_scalar);
> > > 
> > > >  
> > > 
> > > _______________________________________________
> > > 
> > > mesa-dev mailing list
> > > 
> > > mesa-dev at lists.freedesktop.org
> > > 
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > > 
> > > 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20171214/6b9815c3/attachment.html>