[Mesa-dev] [PATCH] i965/nir: do int64 lowering before optimization

Sun Feb 4 18:58:45 UTC 2018

On Wed, Dec 13, 2017 at 11:21 PM, Iago Toral <itoral at igalia.com> wrote:
> On Tue, 2017-12-12 at 08:20 +0100, Iago Toral wrote:
>
> On Mon, 2017-12-11 at 08:01 -0800, Jason Ekstrand wrote:
>
> On Mon, Dec 11, 2017 at 12:55 AM, Iago Toral <itoral at igalia.com> wrote:
>
> This didn't get any reviews yet. Any takers?
>
> On Fri, 2017-12-01 at 13:46 +0100, Iago Toral Quiroga wrote:
>> Otherwise loop unrolling will fail to see the actual cost of
>> the unrolling operations when the loop body contains 64-bit integer
>> instructions, and very specially when the divmod64 lowering applies,
>> since its lowering is quite expensive.
>>
>> Without this change, some in-development CTS tests for int64
>> get stuck forever trying to register allocate a shader with
>> over 50K SSA values. The large number of SSA values is the result
>> of NIR first unrolling multiple seemingly simple loops that involve
>> int64 instructions, only to then lower these instructions to produce
>> a massive pile of code (due to the divmod64 lowering in the unrolled
>> instructions).
>>
>> With this change, loop unrolling will see the loops with the int64
>> code already lowered and will realize that it is too expensive to
>> unroll.
>
>
> Hrm... I'm not quite sure what I think of this.  I put it after nir_optimize
> because I wanted opt_algebraic to be able to work it's magic and hopefully
> remove a bunch of int64 ops before we lower them.  In particular, we have
> optimizations to remove integer division and replace it with shifts.
> However, loop unrolling does need to happen before lower_indirect_derefs so
> that lower_indirect_derefs will do as little work as possible.
>
> This is a bit of a pickle...  I don't really want to add a third
> brw_nir_optimize call.  It probably wouldn't be the end of the world but it
> does add compile time.
>
> One crazy idea which I don't think I like would be to have a quick pass that
> walks the IR and sees if there are any 64-bit SSA values.  If it does, we
> run brw_nir_optimize without loop unrolling then 64-bit lowering and then we
> go into the normal brw_nir_optimize.
>
>
> With the constraints you mention above, I am not sure that we have many more
> options... what if we always run opt_algebraic first followed by int64
> lowering before the first nir_optimize? That would only add an extra
> opt_algebraic instead of a full nir_optimize. Would that be better than
> adding that 64-bit SSA scan pre-pass?
>
>
> We still need to make a decision for this, does my proposal sound better
> than than the other options on the table? If not I guess we should go with
> the 64-bit SSA scan pre-pass.

Realized I never responded to this -- sorry.

Yes, I think your proposal sounds good.