[Mesa-dev] [PATCH] i965/nir: do int64 lowering before optimization

Mon Feb 5 13:40:59 UTC 2018

On Sun, 2018-02-04 at 10:58 -0800, Matt Turner wrote:
> On Wed, Dec 13, 2017 at 11:21 PM, Iago Toral <itoral at igalia.com>
> wrote:
> > On Tue, 2017-12-12 at 08:20 +0100, Iago Toral wrote:
> > 
> > On Mon, 2017-12-11 at 08:01 -0800, Jason Ekstrand wrote:
> > 
> > On Mon, Dec 11, 2017 at 12:55 AM, Iago Toral <itoral at igalia.com>
> > wrote:
> > 
> > This didn't get any reviews yet. Any takers?
> > 
> > On Fri, 2017-12-01 at 13:46 +0100, Iago Toral Quiroga wrote:
> > > Otherwise loop unrolling will fail to see the actual cost of
> > > the unrolling operations when the loop body contains 64-bit
> > > integer
> > > instructions, and very specially when the divmod64 lowering
> > > applies,
> > > since its lowering is quite expensive.
> > > 
> > > Without this change, some in-development CTS tests for int64
> > > get stuck forever trying to register allocate a shader with
> > > over 50K SSA values. The large number of SSA values is the result
> > > of NIR first unrolling multiple seemingly simple loops that
> > > involve
> > > int64 instructions, only to then lower these instructions to
> > > produce
> > > a massive pile of code (due to the divmod64 lowering in the
> > > unrolled
> > > instructions).
> > > 
> > > With this change, loop unrolling will see the loops with the
> > > int64
> > > code already lowered and will realize that it is too expensive to
> > > unroll.
> > 
> > 
> > Hrm... I'm not quite sure what I think of this.  I put it after
> > nir_optimize
> > because I wanted opt_algebraic to be able to work it's magic and
> > hopefully
> > remove a bunch of int64 ops before we lower them.  In particular,
> > we have
> > optimizations to remove integer division and replace it with
> > shifts.
> > However, loop unrolling does need to happen before
> > lower_indirect_derefs so
> > that lower_indirect_derefs will do as little work as possible.
> > 
> > This is a bit of a pickle...  I don't really want to add a third
> > brw_nir_optimize call.  It probably wouldn't be the end of the
> > world but it
> > does add compile time.
> > 
> > One crazy idea which I don't think I like would be to have a quick
> > pass that
> > walks the IR and sees if there are any 64-bit SSA values.  If it
> > does, we
> > run brw_nir_optimize without loop unrolling then 64-bit lowering
> > and then we
> > go into the normal brw_nir_optimize.
> > 
> > 
> > With the constraints you mention above, I am not sure that we have
> > many more
> > options... what if we always run opt_algebraic first followed by
> > int64
> > lowering before the first nir_optimize? That would only add an
> > extra
> > opt_algebraic instead of a full nir_optimize. Would that be better
> > than
> > adding that 64-bit SSA scan pre-pass?
> > 
> > 
> > We still need to make a decision for this, does my proposal sound
> > better
> > than than the other options on the table? If not I guess we should
> > go with
> > the 64-bit SSA scan pre-pass.
> 
> Realized I never responded to this -- sorry.
> 
> Yes, I think your proposal sounds good.

Thanks, just sent a v2.

Iago