[Mesa-dev] [PATCH 8/9] i965/nir: Run the ffma peephole after the rest of the optimizations
Matt Turner
mattst88 at gmail.com
Tue Mar 31 10:47:49 PDT 2015
On Mon, Mar 23, 2015 at 8:40 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Mon, Mar 23, 2015 at 8:34 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> On Mon, Mar 23, 2015 at 8:13 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>> The idea here is that fusing multiply-add combinations too early can reduce
>>> our ability to perform CSE and value-numbering. Instead, we split ffma
>>> opcodes up-front, hope CSE cleans up, and then fuse after-the-fact.
>>> Unless an algebraic pass does something silly where it inserts something
>>> between the multiply and the add, splitting and re-fusing should never
>>> cause a problem. We run the late algebraic optimizations after this so
>>> that things like compare-with-zero don't hurt our ability to fuse things.
>>>
>>> shader-db results for fragment shaders on Haswell:
>>> total instructions in shared programs: 4390538 -> 4379236 (-0.26%)
>>> instructions in affected programs: 989359 -> 978057 (-1.14%)
>>> helped: 5308
>>> HURT: 97
>>> GAINED: 78
>>> LOST: 5
>>>
>>> This does, unfortunately, cause some substantial hurt to a shader in Kerbal
>>> Space Program. However, the damage is caused by changing a single
>>> instruction from a ffma to an add. This, in turn, *decreases* register
>>> pressure in one part of the program causing it to fail to register allocate
>>> and spill. Given the overwhelmingly positive results in other shaders and
>>> the fact that the NIR for the Kerbal shaders is actually better, this
>>> should be considered a positive.
>>> ---
>>> src/mesa/drivers/dri/i965/brw_context.c | 1 +
>>> src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 10 ++++++++--
>>> 2 files changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c
>>> index ed6fdff..7cab1c9 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_context.c
>>> +++ b/src/mesa/drivers/dri/i965/brw_context.c
>>> @@ -558,6 +558,7 @@ brw_initialize_context_constants(struct brw_context *brw)
>>>
>>> static const nir_shader_compiler_options gen6_nir_options = {
>>> .native_integers = true,
>>> + .lower_ffma = true,
>>
>> I see what's going on, but this probably deserves a comment.
>
> Sure. How about:
>
>> In order to help allow for better CSE at the NIR level we tell NIR to split all ffma instructions during opt_algebraic and we then re-combine them as a later step.
Sounds good.
>
>>> };
>>>
>>> /* We want the GLSL compiler to emit code that uses condition codes */
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> index cdf9b01..b0b86e3 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> @@ -51,8 +51,6 @@ nir_optimize(nir_shader *nir)
>>> nir_validate_shader(nir);
>>> progress |= nir_opt_algebraic(nir);
>>> nir_validate_shader(nir);
>>> - progress |= nir_opt_peephole_ffma(nir);
>>> - nir_validate_shader(nir);
>>> progress |= nir_opt_constant_folding(nir);
>>> nir_validate_shader(nir);
>>> progress |= nir_opt_remove_phis(nir);
>>> @@ -131,6 +129,12 @@ fs_visitor::emit_nir_code()
>>>
>>> nir_optimize(nir);
>>>
>>> + if (brw->gen >= 6) {
>>> + /* Try and fuse multiply-adds */
>>> + nir_opt_peephole_ffma(nir);
>>> + nir_validate_shader(nir);
>>> + }
>>> +
>>> nir_opt_algebraic_late(nir);
>>> nir_validate_shader(nir);
>>>
>>> @@ -141,6 +145,8 @@ fs_visitor::emit_nir_code()
>>> nir_validate_shader(nir);
>>> nir_copy_prop(nir);
>>> nir_validate_shader(nir);
>>> + nir_opt_dce(nir);
>>> + nir_validate_shader(nir);
>>
>> Just to confirm: this did help something?
>
> We should always run dead code after copy prop and we weren't.
> However, that should probably be its own patch. I'll split it out if
> you'd like.
Seems like the right thing to do.
More information about the mesa-dev
mailing list