[Mesa-dev] [PATCH 2/2] r600: use the clamped versions of rcp/rsq for eg/cayman.

Roland Scheidegger sroland at vmware.com
Thu Nov 9 03:07:02 UTC 2017


All on Juniper.

But anyway, I've got another solution, with the only drawback that I
don't really know what it does due to docs being lackluster/misleading
there :-).
But that would let us keep the ieee opcodes. And while I don't know what
it does, I suspect it's a better idea regardless ;-). hw supported by
nouveau drivers might just do the same without anyone even knowing with
some luck...

Roland



Am 08.11.2017 um 07:20 schrieb Ilia Mirkin:
> Actually cayman gets half of it - it gets the abs, but not clamped. I
> wonder what happens if you go the other way -- use the IEEE version of
> the op for RSQ() (presumably you're not testing this on cayman).
> 
> On Wed, Nov 8, 2017 at 1:18 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> tgsi_rsq appears to ignore the passed-in op and always puts in
>> ALU_OP1_RECIPSQRT_CLAMPED anyways. It also sticks an absolute value on
>> the RSQ() argument. This only happens for eg, not cayman. (Probably
>> why only the rcp_clamped change appeared to be necessary.)
>>
>> This is odd though, because there's no clamping like that in other
>> drivers. The trace you made looks fine on both nvc0 and nv50.
>>
>> On Tue, Nov 7, 2017 at 11:01 PM,  <sroland at vmware.com> wrote:
>>> From: Roland Scheidegger <sroland at vmware.com>
>>>
>>> r600 already used the clamped versions, but for some reason this was
>>> different to eg/cayman.
>>> (Note that it has been different since essentially forever, 7 years, since
>>> df62338c491f2cace1a48f99de78e83b5edd82fd in particular, which changed
>>> this for r600 but not eg (cayman wasn't supported back then, but probably
>>> copied this from the eg part later). The commit does not mention any reason
>>> why this difference should exist.)
>>> This seems a bit unfortunate, since it would be nice to use ieee arithmetic,
>>> I have no idea what this could potentially break and no idea if it really
>>> makes sense going back to legacy-style rcp/rsq...
>>> This however prevents misrenderings in This War of Mine since using ieee
>>> muls (ce7a045feeef8cad155f1c9aa07f166e146e3d00), albeit strictly speaking
>>> only rcp_clamped is necessary for this. It seems likely the root cause is
>>> some x * rcp(y) calculation where both x and y evaluate to 0. Albeit it
>>> apparently works with other drivers, not sure what's up with that...
>>>
>>> Bugzilla: https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freedesktop.org_show-5Fbug.cgi-3Fid-3D103544&d=DwIBaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=FYMO5RZgIVWRluOu5DMEzW2xYcV3A5ZOOioWlZcdonw&s=b2fwORC2yAK7ezpjud4xeebdQRW-632-SZjTJGI-JHk&e=
>>> ---
>>>  src/gallium/drivers/r600/r600_shader.c | 16 ++++++++++++----
>>>  1 file changed, 12 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c
>>> index 6a755bb3fd..62fc4da901 100644
>>> --- a/src/gallium/drivers/r600/r600_shader.c
>>> +++ b/src/gallium/drivers/r600/r600_shader.c
>>> @@ -9033,8 +9033,12 @@ static const struct r600_shader_tgsi_instruction eg_shader_tgsi_instruction[] =
>>>         [TGSI_OPCODE_ARL]       = { ALU_OP0_NOP, tgsi_eg_arl},
>>>         [TGSI_OPCODE_MOV]       = { ALU_OP1_MOV, tgsi_op2},
>>>         [TGSI_OPCODE_LIT]       = { ALU_OP0_NOP, tgsi_lit},
>>> -       [TGSI_OPCODE_RCP]       = { ALU_OP1_RECIP_IEEE, tgsi_trans_srcx_replicate},
>>> -       [TGSI_OPCODE_RSQ]       = { ALU_OP1_RECIPSQRT_IEEE, tgsi_rsq},
>>> +       /* XXX:
>>> +        * For state trackers other than OpenGL, we'll want to use
>>> +        * _RECIP_IEEE/_RECIPSQRT_IEEE instead.
>>> +        */
>>> +       [TGSI_OPCODE_RCP]       = { ALU_OP1_RECIP_CLAMPED, tgsi_trans_srcx_replicate},
>>> +       [TGSI_OPCODE_RSQ]       = { ALU_OP1_RECIPSQRT_CLAMPED, tgsi_rsq},
>>>         [TGSI_OPCODE_EXP]       = { ALU_OP0_NOP, tgsi_exp},
>>>         [TGSI_OPCODE_LOG]       = { ALU_OP0_NOP, tgsi_log},
>>>         [TGSI_OPCODE_MUL]       = { ALU_OP2_MUL_IEEE, tgsi_op2},
>>> @@ -9256,8 +9260,12 @@ static const struct r600_shader_tgsi_instruction cm_shader_tgsi_instruction[] =
>>>         [TGSI_OPCODE_ARL]       = { ALU_OP0_NOP, tgsi_eg_arl},
>>>         [TGSI_OPCODE_MOV]       = { ALU_OP1_MOV, tgsi_op2},
>>>         [TGSI_OPCODE_LIT]       = { ALU_OP0_NOP, tgsi_lit},
>>> -       [TGSI_OPCODE_RCP]       = { ALU_OP1_RECIP_IEEE, cayman_emit_float_instr},
>>> -       [TGSI_OPCODE_RSQ]       = { ALU_OP1_RECIPSQRT_IEEE, cayman_emit_float_instr},
>>> +       /* XXX:
>>> +        * For state trackers other than OpenGL, we'll want to use
>>> +        * _RECIP_IEEE/_RECIPSQRT_IEEE instead.
>>> +        */
>>> +       [TGSI_OPCODE_RCP]       = { ALU_OP1_RECIP_CLAMPED, cayman_emit_float_instr},
>>> +       [TGSI_OPCODE_RSQ]       = { ALU_OP1_RECIPSQRT_CLAMPED, cayman_emit_float_instr},
>>>         [TGSI_OPCODE_EXP]       = { ALU_OP0_NOP, tgsi_exp},
>>>         [TGSI_OPCODE_LOG]       = { ALU_OP0_NOP, tgsi_log},
>>>         [TGSI_OPCODE_MUL]       = { ALU_OP2_MUL_IEEE, tgsi_op2},
>>> --
>>> 2.12.3
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=DwIBaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=FYMO5RZgIVWRluOu5DMEzW2xYcV3A5ZOOioWlZcdonw&s=BMC4YllECcl4gCLztCjAYp4GShGEolmDf7oSttV0cZA&e=



More information about the mesa-dev mailing list