[Mesa-dev] [PATCH v2 0/5] nv50/ir: Improve Performance of Integer Multiplication
Rhys Perry
pendingchaos02 at gmail.com
Wed Jul 18 17:05:38 UTC 2018
Changes in v2:
- rebase
- bring back constant folding for multiplication by power-of-twos for nv50
- remove TODO in nv50_ir_target_gm107.cpp
- document XMAD's flags
- change how XMAD's per-operand flags are represented
- move util/bitscan.h stuff into a new patch
- stylistic changes
This series improve the performance of integer multiplication by removing
much usage of the very slow IMAD and IMUL on Maxwell+ and improving
multiplication by immediates on Fermi+. It depends on the
SHLADD/IndirectPropagation patches.
The first and second patch add support for the XMAD instruction in codegen
The third patch replaces most IMADs and IMULs with a sequence of XMADs on
Maxwell+. This is far faster but increases the total instructions in the
shader-db by 0.72%.
This number is significantly lowered with the next patch. It replaces many
multiplications by immediates with instructions that should be as fast or
faster than the XMAD approach. They are also typically smaller and less
register heavy, so they decrease the total instruction count by -0.50%.
This series gives about a ~50% speedup in fragment-heavy scenaries with
Dolphin 5.0 on my GTX 1060. All timings were made with interesting looking
fifos from Dolphin's bugtracker:
Wind Waker: 18 FPS -> 26 FPS at 3x internal resolution
Wind Waker: 8 FPS -> 11 FPS at 5x internal resolution
Paper Mario?: 26 FPS -> 42 FPS at 5x internal resolution
SpongeBob Movie: 19 FPS -> 30 FPS at 5x internal resolution
Unigine Heaven and Unigine Valley seems to run the same at low quality with
no anti-aliasing and no tessellation. SuperTuxKart and 0 A.D. also show no
change.
It's possible these patches may break something. Piglit shows no functionality
regressions though they should probably be tested for improvements or breakage
with actual applications.
These patches can also be found on my github:
https://github.com/pendingchaos/mesa/tree/nv-xmad-v2
The final changes in shader-db are as follows:
total instructions in shared programs : 5256901 -> 5268293 (0.22%)
total gprs used in shared programs : 624328 -> 624196 (-0.02%)
total shared used in shared programs : 360704 -> 360704 (0.00%)
total local used in shared programs : 20952 -> 20952 (0.00%)
local shared gpr inst bytes
helped 0 0 255 680 680
hurt 0 0 128 1484 1484
Rhys Perry (5):
nv50/ir: add preliminary support for OP_XMAD
gm107/ir: add support for OP_XMAD on GM107+
nv50/ir: optimize imul/imad to xmads
util: Add u_bit_count64 and u_next_power_of_two
nv50/ir: further optimize multiplication by immediates
src/gallium/drivers/nouveau/codegen/nv50_ir.h | 23 +++
.../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 63 +++++++
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 203 +++++++++++++++++++--
.../drivers/nouveau/codegen/nv50_ir_print.cpp | 18 ++
.../drivers/nouveau/codegen/nv50_ir_target.cpp | 7 +-
.../nouveau/codegen/nv50_ir_target_gm107.cpp | 6 +-
.../nouveau/codegen/nv50_ir_target_nv50.cpp | 1 +
.../nouveau/codegen/nv50_ir_target_nvc0.cpp | 16 ++
src/util/bitscan.h | 28 +++
9 files changed, 345 insertions(+), 20 deletions(-)
--
2.14.4
More information about the mesa-dev
mailing list