[Mesa-dev] [PATCH 8/9] i965/nir: Run the ffma peephole after the rest of the optimizations

Tue Mar 31 10:47:49 PDT 2015

On Mon, Mar 23, 2015 at 8:40 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Mon, Mar 23, 2015 at 8:34 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> On Mon, Mar 23, 2015 at 8:13 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>> The idea here is that fusing multiply-add combinations too early can reduce
>>> our ability to perform CSE and value-numbering.  Instead, we split ffma
>>> opcodes up-front, hope CSE cleans up, and then fuse after-the-fact.
>>> Unless an algebraic pass does something silly where it inserts something
>>> between the multiply and the add, splitting and re-fusing should never
>>> cause a problem.  We run the late algebraic optimizations after this so
>>> that things like compare-with-zero don't hurt our ability to fuse things.
>>>
>>> shader-db results for fragment shaders on Haswell:
>>> total instructions in shared programs: 4390538 -> 4379236 (-0.26%)
>>> instructions in affected programs:     989359 -> 978057 (-1.14%)
>>> helped:                                5308
>>> HURT:                                  97
>>> GAINED:                                78
>>> LOST:                                  5
>>>
>>> This does, unfortunately, cause some substantial hurt to a shader in Kerbal
>>> Space Program.  However, the damage is caused by changing a single
>>> instruction from a ffma to an add.  This, in turn, *decreases* register
>>> pressure in one part of the program causing it to fail to register allocate
>>> and spill.  Given the overwhelmingly positive results in other shaders and
>>> the fact that the NIR for the Kerbal shaders is actually better, this
>>> should be considered a positive.
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_context.c  |  1 +
>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 10 ++++++++--
>>>  2 files changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c
>>> index ed6fdff..7cab1c9 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_context.c
>>> +++ b/src/mesa/drivers/dri/i965/brw_context.c
>>> @@ -558,6 +558,7 @@ brw_initialize_context_constants(struct brw_context *brw)
>>>
>>>     static const nir_shader_compiler_options gen6_nir_options = {
>>>        .native_integers = true,
>>> +      .lower_ffma = true,
>>
>> I see what's going on, but this probably deserves a comment.
>
> Sure.  How about:
>
>> In order to help allow for better CSE at the NIR level we tell NIR to split all ffma instructions during opt_algebraic and we then re-combine them as a later step.

Sounds good.

>
>>>     };
>>>
>>>     /* We want the GLSL compiler to emit code that uses condition codes */
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> index cdf9b01..b0b86e3 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> @@ -51,8 +51,6 @@ nir_optimize(nir_shader *nir)
>>>        nir_validate_shader(nir);
>>>        progress |= nir_opt_algebraic(nir);
>>>        nir_validate_shader(nir);
>>> -      progress |= nir_opt_peephole_ffma(nir);
>>> -      nir_validate_shader(nir);
>>>        progress |= nir_opt_constant_folding(nir);
>>>        nir_validate_shader(nir);
>>>        progress |= nir_opt_remove_phis(nir);
>>> @@ -131,6 +129,12 @@ fs_visitor::emit_nir_code()
>>>
>>>     nir_optimize(nir);
>>>
>>> +   if (brw->gen >= 6) {
>>> +      /* Try and fuse multiply-adds */
>>> +      nir_opt_peephole_ffma(nir);
>>> +      nir_validate_shader(nir);
>>> +   }
>>> +
>>>     nir_opt_algebraic_late(nir);
>>>     nir_validate_shader(nir);
>>>
>>> @@ -141,6 +145,8 @@ fs_visitor::emit_nir_code()
>>>     nir_validate_shader(nir);
>>>     nir_copy_prop(nir);
>>>     nir_validate_shader(nir);
>>> +   nir_opt_dce(nir);
>>> +   nir_validate_shader(nir);
>>
>> Just to confirm: this did help something?
>
> We should always run dead code after copy prop and we weren't.
> However, that should probably be its own patch.  I'll split it out if
> you'd like.

Seems like the right thing to do.