[Mesa-dev] [RFC] nir/opt_peephole_ffma: Bypass fusion if any operand of fadd and fmul is a const
Jason Ekstrand
jason at jlekstrand.net
Mon Sep 21 08:32:05 PDT 2015
On Mon, Sep 21, 2015 at 12:14 AM, Eduardo Lima Mitev <elima at igalia.com> wrote:
> On 09/19/2015 09:00 PM, Jason Ekstrand wrote:
>>
>> On Sep 18, 2015 2:49 AM, "Eduardo Lima Mitev" <elima at igalia.com
>> <mailto:elima at igalia.com>> wrote:
>>>
>>> When both fadd and fmul instructions have at least one operand that is a
>>> constant and it is only used once, the total number of instructions can
>>> be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because
>>> the constants will be progagated as immediate operands of fmul and fadd.
>>
>> Right... I'm pretty sure that I've written this patch once before. At
>> the time we didn't do this because we thought Matt's constant combining
>> would take care of most of those issues and we could just emit the MAD.
>> However, that pass doesn't always combine things optimally and doesn't
>> exist for vec4.
>>
>>> This patch modifies opt_peephole_ffma pass to detect this situation and
>>> bails-out fusing fmul+fadd into ffma.
>>>
>>> As shown in shader-db results below, it seems to help a good bunch.
>> However,
>>> there are some caveats:
>>>
>>> * It seems i965 specific, so I'm not sure if modifying the NIR pass
>>> directly is desired, as opposed to moving this to the backend.
>>>
>>> * There are still a high number of HURTs, but these could be reduced
>> by being
>>> more specific in the conditions to bailout.
>>>
>>> total instructions in shared programs: 1683959 -> 1677447 (-0.39%)
>>> instructions in affected programs: 604918 -> 598406 (-1.08%)
>>> helped: 4633
>>> HURT: 804
>>> GAINED: 0
>>> LOST: 0
>>
>> What does that look like if you split that between vec4 and fs?
>>
>
> The above is shader-db results filtered for VS only. Following is the
> same results filtered for FS:
>
> total instructions in shared programs: 4593761 -> 4590252 (-0.08%)
> instructions in affected programs: 538690 -> 535181 (-0.65%)
> helped: 2981
> HURT: 38
> GAINED: 4
> LOST: 0
>
> And this is the raw, unfiltered report:
>
> total instructions in shared programs: 6277603 -> 6267582 (-0.16%)
> instructions in affected programs: 1143656 -> 1133635 (-0.88%)
> helped: 7614
> HURT: 842
> GAINED: 4
> LOST: 0
>
> For FS, the gain is more noticeable because vector instructions tend to
> neutralize the benefits (constants cannot be propagated to
> multi-component operands).
In that case, it seems like this might be a good idea for the FS as
well. I was always a bit skeptical that constant combining would just
fix it entirely. That said, I would like Matt's take before I tell
you to push anything.
If you want to make it less i965-specific, you could always add an
"allow constants" boolean or someting. That said, i965 is the only
place that the ffma peephole is currently used, so we can let the next
user of it fix that. :-)
--Jason
> Eduardo
>
>>> ---
>>> src/glsl/nir/nir_opt_peephole_ffma.c | 31 +++++++++++++++++++++++++++++++
>>> 1 file changed, 31 insertions(+)
>>>
>>> diff --git a/src/glsl/nir/nir_opt_peephole_ffma.c
>> b/src/glsl/nir/nir_opt_peephole_ffma.c
>>> index 4f0f0da..da47f8f 100644
>>> --- a/src/glsl/nir/nir_opt_peephole_ffma.c
>>> +++ b/src/glsl/nir/nir_opt_peephole_ffma.c
>>> @@ -133,6 +133,28 @@ get_mul_for_src(nir_alu_src *src, int num_components,
>>> return alu;
>>> }
>>>
>>> +/**
>>> + * Given a list of (at least two) nir_alu_src's, tells if any of them
>> is a
>>> + * constant value and is used only once.
>>> + */
>>> +static bool
>>> +any_alu_src_is_a_constant(nir_alu_src srcs[])
>>> +{
>>> + for (unsigned i = 0; i < 2; i++) {
>>> + if (srcs[i].src.ssa->parent_instr->type ==
>> nir_instr_type_load_const) {
>>> + nir_load_const_instr *load_const =
>>> + nir_instr_as_load_const (srcs[i].src.ssa->parent_instr);
>>> +
>>> + if (list_length(&load_const->def.uses) == 1 &&
>>> + list_length(&load_const->def.if_uses) == 0) {
>>> + return true;
>>> + }
>>> + }
>>> + }
>>> +
>>> + return false;
>>> +}
>>> +
>>> static bool
>>> nir_opt_peephole_ffma_block(nir_block *block, void *void_state)
>>> {
>>> @@ -183,6 +205,15 @@ nir_opt_peephole_ffma_block(nir_block *block,
>> void *void_state)
>>> mul_src[0] = mul->src[0].src.ssa;
>>> mul_src[1] = mul->src[1].src.ssa;
>>>
>>> + /* If any of the operands of the fmul and any of the fadd is a
>> constant,
>>> + * we bypass because it will be more efficient as the constants
>> will be
>>> + * propagated as operands, potentially saving two load_const
>> instructions.
>>> + */
>>> + if (any_alu_src_is_a_constant(mul->src) &&
>>> + any_alu_src_is_a_constant(add->src)) {
>>> + continue;
>>> + }
>>> +
>>> if (abs) {
>>> for (unsigned i = 0; i < 2; i++) {
>>> nir_alu_instr *abs = nir_alu_instr_create(state->mem_ctx,
>>> --
>>> 2.4.6
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org <mailto:mesa-dev at lists.freedesktop.org>
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
More information about the mesa-dev
mailing list