[Mesa-dev] [PATCH 0/6] i965/fs: Integer multiplication improvements

Fri May 15 14:02:03 PDT 2015

This series reworks how we do integer multiplication in the i965/fs
backend and significantly improves code generation for Broadwell's
scalar vertex shaders with NIR by allowing constant propagation into
the MUL instruction (wow that code was stupid, and it still kind of
is!).

Before this series, Jason said enabling NIR for scalar vertex shaders
had this result:

total instructions in shared programs: 2724483 -> 2711790 (-0.47%)
instructions in affected programs:     1860859 -> 1848166 (-0.68%)
helped:                                4387
HURT:                                  4758

After this series, the results are:

Broadwell, vertex shaders only, with and without NIR:
total instructions in shared programs: 2742062 -> 2681339 (-2.21%)
instructions in affected programs:     1514770 -> 1454047 (-4.01%)
helped:                                5813
HURT:                                  1120

Along the way, I move when we split integer multiplication into multiple
instructions (on Gen < 8), implement SIMD16 support, and reimplement
integer multiplication on Gen < 8 without using the accumulator.

Patches 3 and 4, add SIMD16 support (for Haswell only, since pre-Gen7
already works, and IVB/BYT have a bug that prevents it from working).
I don't particularly care if those patches go in separately, or if 6/6
is squashed with 4/6, but I think they are instructive -- I'll follow
up with a piglit test that demonstrates that settings the wrong quarter
control indeed generates incorrect results.

I did not touch the imul_high opcode, though it could be lowered in a
similar way (and probably in 3 instructions using the same tricks in 6/6).
That would be a nice follow-on project, and it would mean we'd never see
a MACH instruction during optimizations.