[Mesa-dev] [RFC] nir/opt_peephole_ffma: Bypass fusion if any operand of fadd and fmul is a const

Jason Ekstrand jason at jlekstrand.net
Wed Sep 23 10:56:44 PDT 2015


On Tue, Sep 22, 2015 at 4:20 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Tue, Sep 22, 2015 at 4:04 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> On Fri, Sep 18, 2015 at 12:49 AM, Eduardo Lima Mitev <elima at igalia.com> wrote:
>>> When both fadd and fmul instructions have at least one operand that is a
>>> constant and it is only used once, the total number of instructions can
>>> be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because
>>> the constants will be progagated as immediate operands of fmul and fadd.
>>>
>>> This patch modifies opt_peephole_ffma pass to detect this situation and
>>> bails-out fusing fmul+fadd into ffma.
>>>
>>> As shown in shader-db results below, it seems to help a good bunch. However,
>>> there are some caveats:
>>>
>>> * It seems i965 specific, so I'm not sure if modifying the NIR pass
>>> directly is desired, as opposed to moving this to the backend.
>>>
>>> * There are still a high number of HURTs, but these could be reduced by being
>>> more specific in the conditions to bailout.
>>>
>>> total instructions in shared programs: 1683959 -> 1677447 (-0.39%)
>>> instructions in affected programs:     604918 -> 598406 (-1.08%)
>>> helped:                                4633
>>> HURT:                                  804
>>> GAINED:                                0
>>> LOST:                                  0
>>> ---
>>
>> Interesting -- yeah, I've thought about doing this as well. It was
>> more difficult before because with GLSL IR (where I was trying to do
>> it) it wasn't possible to determine if the constant was used by
>> multiple 3-src instructions. Actually, your check might be able to be
>> more refined to consider only uses of 3-src instructions.
>>
>> But that's getting kind of hardware-specific.
>
> If we want to move this into the i965 driver we can.  I think we're
> the only users.  That would completely get rid of
> hardware-specificness issues.
> --Jason

Thinking about this a bit more, my inclination is to just push both
patches and then add another that moves nir_opt_peephole_ffma to the
i965 driver and call it a day.  We're the only ones using it and it's
demonstrated itself tricky enough that we should just free ourselves
to use backend-specific heuristics.

Matt, thoughts?

--Jason

>> Perhaps another approach would be to modify the
>> opt_combine_constants() pass to split MADs under some circumstances --
>> e.g., it accounts for the only use of a constant we would otherwise
>> have to promote. But of course we don't have that pass for the vec4
>> backend.
>>
>> In the mean time, I've sent a related patch that may be of interest:
>> "[PATCH] nir: Don't fuse fmul into ffma if used by more than 4 fadds."
>>
>> This patch, applied on top of mine gives these results on Haswell:
>>
>> Total:
>> total instructions in shared programs: 6595563 -> 6584885 (-0.16%)
>> instructions in affected programs:     1183608 -> 1172930 (-0.90%)
>> helped:                                8074
>> HURT:                                  842
>> GAINED:                                4
>>
>> FS:
>> total instructions in shared programs: 4863484 -> 4859884 (-0.07%)
>> instructions in affected programs:     554042 -> 550442 (-0.65%)
>> helped:                                3072
>> HURT:                                  38
>> GAINED:                                4
>>
>> VS:
>> total instructions in shared programs: 1729224 -> 1722146 (-0.41%)
>> instructions in affected programs:     629566 -> 622488 (-1.12%)
>> total loops in shared programs:        221 -> 221 (0.00%)
>> helped:                                5002
>> HURT:                                  804
>>
>> Another thing to consider for the vec4 backend is that vec4 uniforms
>> have to be unpacked for use by 3-src instructions (see the
>> VEC4_OPCODE_UNPACK_UNIFORM opcode). We CSE the unpacking operations,
>> but they often do account for increases in instruction counts.
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list