[Mesa-dev] [RFC] nir/opt_peephole_ffma: Bypass fusion if any operand of fadd and fmul is a const
Matt Turner
mattst88 at gmail.com
Tue Sep 22 16:04:34 PDT 2015
On Fri, Sep 18, 2015 at 12:49 AM, Eduardo Lima Mitev <elima at igalia.com> wrote:
> When both fadd and fmul instructions have at least one operand that is a
> constant and it is only used once, the total number of instructions can
> be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because
> the constants will be progagated as immediate operands of fmul and fadd.
>
> This patch modifies opt_peephole_ffma pass to detect this situation and
> bails-out fusing fmul+fadd into ffma.
>
> As shown in shader-db results below, it seems to help a good bunch. However,
> there are some caveats:
>
> * It seems i965 specific, so I'm not sure if modifying the NIR pass
> directly is desired, as opposed to moving this to the backend.
>
> * There are still a high number of HURTs, but these could be reduced by being
> more specific in the conditions to bailout.
>
> total instructions in shared programs: 1683959 -> 1677447 (-0.39%)
> instructions in affected programs: 604918 -> 598406 (-1.08%)
> helped: 4633
> HURT: 804
> GAINED: 0
> LOST: 0
> ---
Interesting -- yeah, I've thought about doing this as well. It was
more difficult before because with GLSL IR (where I was trying to do
it) it wasn't possible to determine if the constant was used by
multiple 3-src instructions. Actually, your check might be able to be
more refined to consider only uses of 3-src instructions.
But that's getting kind of hardware-specific.
Perhaps another approach would be to modify the
opt_combine_constants() pass to split MADs under some circumstances --
e.g., it accounts for the only use of a constant we would otherwise
have to promote. But of course we don't have that pass for the vec4
backend.
In the mean time, I've sent a related patch that may be of interest:
"[PATCH] nir: Don't fuse fmul into ffma if used by more than 4 fadds."
This patch, applied on top of mine gives these results on Haswell:
Total:
total instructions in shared programs: 6595563 -> 6584885 (-0.16%)
instructions in affected programs: 1183608 -> 1172930 (-0.90%)
helped: 8074
HURT: 842
GAINED: 4
FS:
total instructions in shared programs: 4863484 -> 4859884 (-0.07%)
instructions in affected programs: 554042 -> 550442 (-0.65%)
helped: 3072
HURT: 38
GAINED: 4
VS:
total instructions in shared programs: 1729224 -> 1722146 (-0.41%)
instructions in affected programs: 629566 -> 622488 (-1.12%)
total loops in shared programs: 221 -> 221 (0.00%)
helped: 5002
HURT: 804
Another thing to consider for the vec4 backend is that vec4 uniforms
have to be unpacked for use by 3-src instructions (see the
VEC4_OPCODE_UNPACK_UNIFORM opcode). We CSE the unpacking operations,
but they often do account for increases in instruction counts.
More information about the mesa-dev
mailing list