[Mesa-dev] [PATCH 2/2] r600: use the clamped versions of rcp/rsq for eg/cayman.

Roland Scheidegger sroland at vmware.com
Thu Nov 9 03:02:03 UTC 2017


Am 08.11.2017 um 07:18 schrieb Ilia Mirkin:
> tgsi_rsq appears to ignore the passed-in op and always puts in
> ALU_OP1_RECIPSQRT_CLAMPED anyways. It also sticks an absolute value on
> the RSQ() argument. This only happens for eg, not cayman. (Probably
> why only the rcp_clamped change appeared to be necessary.)
Right, I noticed that afterwards, it is using the clamped version for
rsq for eg.

> 
> This is odd though, because there's no clamping like that in other
> drivers. The trace you made looks fine on both nvc0 and nv50.
Yes, it also looks fine on llvmpipe, which adheres to strict ieee rules
(or rather, strict dx10 rules for this, but they are mostly identical,
with min/max returning the non-nan, which I'd nearly bet nvc0/nv50 do too).

Roland


> 
> On Tue, Nov 7, 2017 at 11:01 PM,  <sroland at vmware.com> wrote:
>> From: Roland Scheidegger <sroland at vmware.com>
>>
>> r600 already used the clamped versions, but for some reason this was
>> different to eg/cayman.
>> (Note that it has been different since essentially forever, 7 years, since
>> df62338c491f2cace1a48f99de78e83b5edd82fd in particular, which changed
>> this for r600 but not eg (cayman wasn't supported back then, but probably
>> copied this from the eg part later). The commit does not mention any reason
>> why this difference should exist.)
>> This seems a bit unfortunate, since it would be nice to use ieee arithmetic,
>> I have no idea what this could potentially break and no idea if it really
>> makes sense going back to legacy-style rcp/rsq...
>> This however prevents misrenderings in This War of Mine since using ieee
>> muls (ce7a045feeef8cad155f1c9aa07f166e146e3d00), albeit strictly speaking
>> only rcp_clamped is necessary for this. It seems likely the root cause is
>> some x * rcp(y) calculation where both x and y evaluate to 0. Albeit it
>> apparently works with other drivers, not sure what's up with that...
>>
>> Bugzilla: https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freedesktop.org_show-5Fbug.cgi-3Fid-3D103544&d=DwIBaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=O3sPiamq_x4GgPnGelOAW-6LG12mV9kyATKu7PI5o10&s=W1XEpaa39PmD6AFtLpI21D3QTrP7XYVwGWfFYCT0PRM&e=
>> ---
>>  src/gallium/drivers/r600/r600_shader.c | 16 ++++++++++++----
>>  1 file changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c
>> index 6a755bb3fd..62fc4da901 100644
>> --- a/src/gallium/drivers/r600/r600_shader.c
>> +++ b/src/gallium/drivers/r600/r600_shader.c
>> @@ -9033,8 +9033,12 @@ static const struct r600_shader_tgsi_instruction eg_shader_tgsi_instruction[] =
>>         [TGSI_OPCODE_ARL]       = { ALU_OP0_NOP, tgsi_eg_arl},
>>         [TGSI_OPCODE_MOV]       = { ALU_OP1_MOV, tgsi_op2},
>>         [TGSI_OPCODE_LIT]       = { ALU_OP0_NOP, tgsi_lit},
>> -       [TGSI_OPCODE_RCP]       = { ALU_OP1_RECIP_IEEE, tgsi_trans_srcx_replicate},
>> -       [TGSI_OPCODE_RSQ]       = { ALU_OP1_RECIPSQRT_IEEE, tgsi_rsq},
>> +       /* XXX:
>> +        * For state trackers other than OpenGL, we'll want to use
>> +        * _RECIP_IEEE/_RECIPSQRT_IEEE instead.
>> +        */
>> +       [TGSI_OPCODE_RCP]       = { ALU_OP1_RECIP_CLAMPED, tgsi_trans_srcx_replicate},
>> +       [TGSI_OPCODE_RSQ]       = { ALU_OP1_RECIPSQRT_CLAMPED, tgsi_rsq},
>>         [TGSI_OPCODE_EXP]       = { ALU_OP0_NOP, tgsi_exp},
>>         [TGSI_OPCODE_LOG]       = { ALU_OP0_NOP, tgsi_log},
>>         [TGSI_OPCODE_MUL]       = { ALU_OP2_MUL_IEEE, tgsi_op2},
>> @@ -9256,8 +9260,12 @@ static const struct r600_shader_tgsi_instruction cm_shader_tgsi_instruction[] =
>>         [TGSI_OPCODE_ARL]       = { ALU_OP0_NOP, tgsi_eg_arl},
>>         [TGSI_OPCODE_MOV]       = { ALU_OP1_MOV, tgsi_op2},
>>         [TGSI_OPCODE_LIT]       = { ALU_OP0_NOP, tgsi_lit},
>> -       [TGSI_OPCODE_RCP]       = { ALU_OP1_RECIP_IEEE, cayman_emit_float_instr},
>> -       [TGSI_OPCODE_RSQ]       = { ALU_OP1_RECIPSQRT_IEEE, cayman_emit_float_instr},
>> +       /* XXX:
>> +        * For state trackers other than OpenGL, we'll want to use
>> +        * _RECIP_IEEE/_RECIPSQRT_IEEE instead.
>> +        */
>> +       [TGSI_OPCODE_RCP]       = { ALU_OP1_RECIP_CLAMPED, cayman_emit_float_instr},
>> +       [TGSI_OPCODE_RSQ]       = { ALU_OP1_RECIPSQRT_CLAMPED, cayman_emit_float_instr},
>>         [TGSI_OPCODE_EXP]       = { ALU_OP0_NOP, tgsi_exp},
>>         [TGSI_OPCODE_LOG]       = { ALU_OP0_NOP, tgsi_log},
>>         [TGSI_OPCODE_MUL]       = { ALU_OP2_MUL_IEEE, tgsi_op2},
>> --
>> 2.12.3
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=DwIBaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=O3sPiamq_x4GgPnGelOAW-6LG12mV9kyATKu7PI5o10&s=5mjPUHmN5M2pr4cVX5DmE7_sMJfXmAVL27FxPc55SLo&e=



More information about the mesa-dev mailing list