[Mesa-dev] [PATCH 07/12] intel/compiler: More peephole select

Mon Jul 2 23:24:58 UTC 2018

On 06/28/2018 03:25 PM, Caio Marcelo de Oliveira Filho wrote:
> Hi,
> 
>> diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
>> index 67c062d91f5..6a0d4090fa7 100644
>> --- a/src/intel/compiler/brw_nir.c
>> +++ b/src/intel/compiler/brw_nir.c
>> @@ -557,7 +557,22 @@ brw_nir_optimize(nir_shader *nir, const struct brw_compiler *compiler,
>>        OPT(nir_copy_prop);
>>        OPT(nir_opt_dce);
>>        OPT(nir_opt_cse);
>> +
>> +      /* Passing 0 to the peephole select pass causes it to convert
>> +       * if-statements that contain only move instructions in the branches
>> +       * regardless of the count.
>> +       *
>> +       * Passing 0 to the peephole select pass causes it to convert
> 
> Typo "Passing 1".

Ugh... I thought I fixed that. :(

>> +       * if-statements that contain at most a single ALU instruction (total)
>> +       * in both branches.  Before Gen6, some math instructions were
>> +       * prohibitively expensive and the results of compare operations need an
>> +       * extra resolve step.  For these reasons, this pass is more harmful
>> +       * than good on those platforms.
>> +       */
>>        OPT(nir_opt_peephole_select, 0);
>> +      if (compiler->devinfo->gen >= 6)
>> +         OPT(nir_opt_peephole_select, 1);
> 
> It is not clear to me why running the pass twice (with 0 and then 1)
> instead of using gen >= 6 to select either 0 or 1; or running both
> passes with 1 if gen >= 6 (since 1 covers 0).
> 
> I do understand the second execution can optimize more cases since
> blocks get simplified in the first execution, but was expecting to be
> sufficient to wait the next iteration of the main brw_nir_optimize
> loop.

That was the first thing that I tried, and it caught me by surprise as
well.  If you pass 1, a block like:

    if (condition) {
        a = x;
        b = y;
    }

will not get converted because it has 2 instructions.  If you pass 1, a
block that only has copies will get converted regardless of the count.

I think there's further experimentation to be done here (and I have some
other patches).  Trying a hybrid that can handle infinite copies + N ALU
operations would be a good thing to try.  We also have a similar pass in
the backend, which has more data about the relative cost of things.  It
will take some work to find the right balance between doing things in
NIR (which can benefit other NIR-based optimizations) and doing things
in the backend (which knows how expensive things actually are).

> Thanks,
> Caio