[Mesa-dev] [PATCH 1/4] radeonsi: use v_mad_f32 for fma

Sun Jul 31 20:34:48 UTC 2016

From: Marek Olšák <marek.olsak at amd.com>

v_fma_f32 runs at FP64 rate (= slow). Alien Isolation and F1 2015 seem
to use fma for all d3d multiply-add instructions, which is silly.

This tries to restore performance for those games.

The main difference between v_mad_f32 and v_fma_f32 is that v_mad doesn't
support denormals, which we don't enable anyway, because they are slow too.

Also, there is code size reduction:
  Totals from affected shaders:
  VGPRS: 109796 -> 109808 (0.01 %)
  Spilled SGPRs: 29995 -> 30022 (0.09 %)
  Spilled VGPRs: 12 -> 13 (8.33 %) <-- it's just one shader going from 12 to 13
  Code Size: 6667596 -> 6476356 (-2.87 %) bytes
  Max Waves: 26931 -> 26899 (-0.12 %)

I've not actually tested real performance.
---
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index d382496..4c603fb 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -1769,8 +1769,8 @@ void radeon_llvm_context_init(struct radeon_llvm_context *ctx, const char *tripl
 		HAVE_LLVM >= 0x0308 ? "llvm.exp2.f32" : "llvm.AMDIL.exp.";
 	bld_base->op_actions[TGSI_OPCODE_FLR].emit = build_tgsi_intrinsic_nomem;
 	bld_base->op_actions[TGSI_OPCODE_FLR].intr_name = "llvm.floor.f32";
-	bld_base->op_actions[TGSI_OPCODE_FMA].emit = build_tgsi_intrinsic_nomem;
-	bld_base->op_actions[TGSI_OPCODE_FMA].intr_name = "llvm.fma.f32";
+	bld_base->op_actions[TGSI_OPCODE_FMA].emit =
+		bld_base->op_actions[TGSI_OPCODE_MAD].emit;
 	bld_base->op_actions[TGSI_OPCODE_FRC].emit = emit_frac;
 	bld_base->op_actions[TGSI_OPCODE_F2I].emit = emit_f2i;
 	bld_base->op_actions[TGSI_OPCODE_F2U].emit = emit_f2u;
-- 
2.7.4