[Mesa-dev] [PATCH 2/2] r600: use the clamped versions of rcp/rsq for eg/cayman.

Wed Nov 8 04:01:26 UTC 2017

From: Roland Scheidegger <sroland at vmware.com>

r600 already used the clamped versions, but for some reason this was
different to eg/cayman.
(Note that it has been different since essentially forever, 7 years, since
df62338c491f2cace1a48f99de78e83b5edd82fd in particular, which changed
this for r600 but not eg (cayman wasn't supported back then, but probably
copied this from the eg part later). The commit does not mention any reason
why this difference should exist.)
This seems a bit unfortunate, since it would be nice to use ieee arithmetic,
I have no idea what this could potentially break and no idea if it really
makes sense going back to legacy-style rcp/rsq...
This however prevents misrenderings in This War of Mine since using ieee
muls (ce7a045feeef8cad155f1c9aa07f166e146e3d00), albeit strictly speaking
only rcp_clamped is necessary for this. It seems likely the root cause is
some x * rcp(y) calculation where both x and y evaluate to 0. Albeit it
apparently works with other drivers, not sure what's up with that...

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103544
---
 src/gallium/drivers/r600/r600_shader.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c
index 6a755bb3fd..62fc4da901 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -9033,8 +9033,12 @@ static const struct r600_shader_tgsi_instruction eg_shader_tgsi_instruction[] =
 	[TGSI_OPCODE_ARL]	= { ALU_OP0_NOP, tgsi_eg_arl},
 	[TGSI_OPCODE_MOV]	= { ALU_OP1_MOV, tgsi_op2},
 	[TGSI_OPCODE_LIT]	= { ALU_OP0_NOP, tgsi_lit},
-	[TGSI_OPCODE_RCP]	= { ALU_OP1_RECIP_IEEE, tgsi_trans_srcx_replicate},
-	[TGSI_OPCODE_RSQ]	= { ALU_OP1_RECIPSQRT_IEEE, tgsi_rsq},
+	/* XXX:
+	 * For state trackers other than OpenGL, we'll want to use
+	 * _RECIP_IEEE/_RECIPSQRT_IEEE instead.
+	 */
+	[TGSI_OPCODE_RCP]	= { ALU_OP1_RECIP_CLAMPED, tgsi_trans_srcx_replicate},
+	[TGSI_OPCODE_RSQ]	= { ALU_OP1_RECIPSQRT_CLAMPED, tgsi_rsq},
 	[TGSI_OPCODE_EXP]	= { ALU_OP0_NOP, tgsi_exp},
 	[TGSI_OPCODE_LOG]	= { ALU_OP0_NOP, tgsi_log},
 	[TGSI_OPCODE_MUL]	= { ALU_OP2_MUL_IEEE, tgsi_op2},
@@ -9256,8 +9260,12 @@ static const struct r600_shader_tgsi_instruction cm_shader_tgsi_instruction[] =
 	[TGSI_OPCODE_ARL]	= { ALU_OP0_NOP, tgsi_eg_arl},
 	[TGSI_OPCODE_MOV]	= { ALU_OP1_MOV, tgsi_op2},
 	[TGSI_OPCODE_LIT]	= { ALU_OP0_NOP, tgsi_lit},
-	[TGSI_OPCODE_RCP]	= { ALU_OP1_RECIP_IEEE, cayman_emit_float_instr},
-	[TGSI_OPCODE_RSQ]	= { ALU_OP1_RECIPSQRT_IEEE, cayman_emit_float_instr},
+	/* XXX:
+	 * For state trackers other than OpenGL, we'll want to use
+	 * _RECIP_IEEE/_RECIPSQRT_IEEE instead.
+	 */
+	[TGSI_OPCODE_RCP]	= { ALU_OP1_RECIP_CLAMPED, cayman_emit_float_instr},
+	[TGSI_OPCODE_RSQ]	= { ALU_OP1_RECIPSQRT_CLAMPED, cayman_emit_float_instr},
 	[TGSI_OPCODE_EXP]	= { ALU_OP0_NOP, tgsi_exp},
 	[TGSI_OPCODE_LOG]	= { ALU_OP0_NOP, tgsi_log},
 	[TGSI_OPCODE_MUL]	= { ALU_OP2_MUL_IEEE, tgsi_op2},
-- 
2.12.3