[Mesa-dev] [PATCH 8/9] i965/nir: Run the ffma peephole after the rest of the optimizations

Jason Ekstrand jason at jlekstrand.net
Mon Mar 23 20:40:19 PDT 2015


On Mon, Mar 23, 2015 at 8:34 PM, Matt Turner <mattst88 at gmail.com> wrote:
> On Mon, Mar 23, 2015 at 8:13 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>> The idea here is that fusing multiply-add combinations too early can reduce
>> our ability to perform CSE and value-numbering.  Instead, we split ffma
>> opcodes up-front, hope CSE cleans up, and then fuse after-the-fact.
>> Unless an algebraic pass does something silly where it inserts something
>> between the multiply and the add, splitting and re-fusing should never
>> cause a problem.  We run the late algebraic optimizations after this so
>> that things like compare-with-zero don't hurt our ability to fuse things.
>>
>> shader-db results for fragment shaders on Haswell:
>> total instructions in shared programs: 4390538 -> 4379236 (-0.26%)
>> instructions in affected programs:     989359 -> 978057 (-1.14%)
>> helped:                                5308
>> HURT:                                  97
>> GAINED:                                78
>> LOST:                                  5
>>
>> This does, unfortunately, cause some substantial hurt to a shader in Kerbal
>> Space Program.  However, the damage is caused by changing a single
>> instruction from a ffma to an add.  This, in turn, *decreases* register
>> pressure in one part of the program causing it to fail to register allocate
>> and spill.  Given the overwhelmingly positive results in other shaders and
>> the fact that the NIR for the Kerbal shaders is actually better, this
>> should be considered a positive.
>> ---
>>  src/mesa/drivers/dri/i965/brw_context.c  |  1 +
>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 10 ++++++++--
>>  2 files changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c
>> index ed6fdff..7cab1c9 100644
>> --- a/src/mesa/drivers/dri/i965/brw_context.c
>> +++ b/src/mesa/drivers/dri/i965/brw_context.c
>> @@ -558,6 +558,7 @@ brw_initialize_context_constants(struct brw_context *brw)
>>
>>     static const nir_shader_compiler_options gen6_nir_options = {
>>        .native_integers = true,
>> +      .lower_ffma = true,
>
> I see what's going on, but this probably deserves a comment.

Sure.  How about:

> In order to help allow for better CSE at the NIR level we tell NIR to split all ffma instructions during opt_algebraic and we then re-combine them as a later step.

>>     };
>>
>>     /* We want the GLSL compiler to emit code that uses condition codes */
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> index cdf9b01..b0b86e3 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> @@ -51,8 +51,6 @@ nir_optimize(nir_shader *nir)
>>        nir_validate_shader(nir);
>>        progress |= nir_opt_algebraic(nir);
>>        nir_validate_shader(nir);
>> -      progress |= nir_opt_peephole_ffma(nir);
>> -      nir_validate_shader(nir);
>>        progress |= nir_opt_constant_folding(nir);
>>        nir_validate_shader(nir);
>>        progress |= nir_opt_remove_phis(nir);
>> @@ -131,6 +129,12 @@ fs_visitor::emit_nir_code()
>>
>>     nir_optimize(nir);
>>
>> +   if (brw->gen >= 6) {
>> +      /* Try and fuse multiply-adds */
>> +      nir_opt_peephole_ffma(nir);
>> +      nir_validate_shader(nir);
>> +   }
>> +
>>     nir_opt_algebraic_late(nir);
>>     nir_validate_shader(nir);
>>
>> @@ -141,6 +145,8 @@ fs_visitor::emit_nir_code()
>>     nir_validate_shader(nir);
>>     nir_copy_prop(nir);
>>     nir_validate_shader(nir);
>> +   nir_opt_dce(nir);
>> +   nir_validate_shader(nir);
>
> Just to confirm: this did help something?

We should always run dead code after copy prop and we weren't.
However, that should probably be its own patch.  I'll split it out if
you'd like.
--Jason

>>
>>     if (unlikely(debug_enabled)) {
>>        fprintf(stderr, "NIR (SSA form) for %s shader:\n", stage_name);
>> --
>> 2.3.3


More information about the mesa-dev mailing list