[Mesa-dev] [PATCH 1/4] r600: use min_dx10/max_dx10 instead of min/max
Roland Scheidegger
sroland at vmware.com
Thu Nov 9 17:49:41 UTC 2017
Am 09.11.2017 um 18:43 schrieb Jan Vesely:
> On Thu, 2017-11-09 at 18:39 +0100, Nicolai Hähnle wrote:
>> On 09.11.2017 18:26, Roland Scheidegger wrote:
>>> Am 09.11.2017 um 18:19 schrieb Jan Vesely:
>>>> On Thu, 2017-11-09 at 03:58 +0100, sroland at vmware.com wrote:
>>>>> From: Roland Scheidegger <sroland at vmware.com>
>>>>>
>>>>> I believe this is the safe thing to do, especially ever since the driver
>>>>> actually generates NaNs for muls too.
>>>>> Albeit since the radeon ISA docs are inaccurate/wrong there, I'm not
>>>>> entirely sure what the non-dx10 versions do,
>>>>
>>>> non-dx10 version return nan if one of the operands is nan (tested on
>>>> Turks).
>>>
>>> Yes, I've modified a piglit test and came to the same conclusion. (I
>>> will put that up for review shortly.)
>>> I don't know why you'd ever want that, though (ieee fmin/fmax also
>>> should return non-nan).
>>
>> My guess is that DX9-level hardware had that behavior by virtue of just
>> not caring about NaN at all, and the hardware folks were just being
>> conservative in adding a new opcode rather than changing the behavior of
>> the old one. I don't think GCN has the old-style min/max.
>
> Looks like it.
> There's v_max_legacy_f32 at least on SI/CI. comment says "D.f =
> max(S0.f, S1.f) (DX9 rules for NaN)"
The problem with dx9 rules for NaN is that noone really seems to know
what they are exactly - in that sense even GL is an improvement since
it's at least obvious you can do whatever floats your boat :-).
Albeit generally, in d3d9 you should never generate a NaN in the first
place, hence how you promote it doesn't really matter.
Roland
>
> Jan
>>
>> Cheers,
>> Nicolai
>>
>>>
>>> Roland
>>>
>>>
>>>>
>>>> Jan
>>>>
>>>>> but (as required by dx10)
>>>>> the dx10 versions should pick a non-nan source over a nan source.
>>>>> Other drivers presumably do the same (radeonsi, llvmpipe).
>>>>> This was shown to make some difference for bug 103544, albeit it is not
>>>>> required to fix it.
>>>>> ---
>>>>> src/gallium/drivers/r600/r600_shader.c | 12 ++++++------
>>>>> src/gallium/drivers/r600/sb/sb_expr.cpp | 2 ++
>>>>> 2 files changed, 8 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c
>>>>> index 188fbc9d47..6a755bb3fd 100644
>>>>> --- a/src/gallium/drivers/r600/r600_shader.c
>>>>> +++ b/src/gallium/drivers/r600/r600_shader.c
>>>>> @@ -8844,8 +8844,8 @@ static const struct r600_shader_tgsi_instruction r600_shader_tgsi_instruction[]
>>>>> [TGSI_OPCODE_DP3] = { ALU_OP2_DOT4_IEEE, tgsi_dp},
>>>>> [TGSI_OPCODE_DP4] = { ALU_OP2_DOT4_IEEE, tgsi_dp},
>>>>> [TGSI_OPCODE_DST] = { ALU_OP0_NOP, tgsi_opdst},
>>>>> - [TGSI_OPCODE_MIN] = { ALU_OP2_MIN, tgsi_op2},
>>>>> - [TGSI_OPCODE_MAX] = { ALU_OP2_MAX, tgsi_op2},
>>>>> + [TGSI_OPCODE_MIN] = { ALU_OP2_MIN_DX10, tgsi_op2},
>>>>> + [TGSI_OPCODE_MAX] = { ALU_OP2_MAX_DX10, tgsi_op2},
>>>>> [TGSI_OPCODE_SLT] = { ALU_OP2_SETGT, tgsi_op2_swap},
>>>>> [TGSI_OPCODE_SGE] = { ALU_OP2_SETGE, tgsi_op2},
>>>>> [TGSI_OPCODE_MAD] = { ALU_OP3_MULADD_IEEE, tgsi_op3},
>>>>> @@ -9042,8 +9042,8 @@ static const struct r600_shader_tgsi_instruction eg_shader_tgsi_instruction[] =
>>>>> [TGSI_OPCODE_DP3] = { ALU_OP2_DOT4_IEEE, tgsi_dp},
>>>>> [TGSI_OPCODE_DP4] = { ALU_OP2_DOT4_IEEE, tgsi_dp},
>>>>> [TGSI_OPCODE_DST] = { ALU_OP0_NOP, tgsi_opdst},
>>>>> - [TGSI_OPCODE_MIN] = { ALU_OP2_MIN, tgsi_op2},
>>>>> - [TGSI_OPCODE_MAX] = { ALU_OP2_MAX, tgsi_op2},
>>>>> + [TGSI_OPCODE_MIN] = { ALU_OP2_MIN_DX10, tgsi_op2},
>>>>> + [TGSI_OPCODE_MAX] = { ALU_OP2_MAX_DX10, tgsi_op2},
>>>>> [TGSI_OPCODE_SLT] = { ALU_OP2_SETGT, tgsi_op2_swap},
>>>>> [TGSI_OPCODE_SGE] = { ALU_OP2_SETGE, tgsi_op2},
>>>>> [TGSI_OPCODE_MAD] = { ALU_OP3_MULADD_IEEE, tgsi_op3},
>>>>> @@ -9265,8 +9265,8 @@ static const struct r600_shader_tgsi_instruction cm_shader_tgsi_instruction[] =
>>>>> [TGSI_OPCODE_DP3] = { ALU_OP2_DOT4_IEEE, tgsi_dp},
>>>>> [TGSI_OPCODE_DP4] = { ALU_OP2_DOT4_IEEE, tgsi_dp},
>>>>> [TGSI_OPCODE_DST] = { ALU_OP0_NOP, tgsi_opdst},
>>>>> - [TGSI_OPCODE_MIN] = { ALU_OP2_MIN, tgsi_op2},
>>>>> - [TGSI_OPCODE_MAX] = { ALU_OP2_MAX, tgsi_op2},
>>>>> + [TGSI_OPCODE_MIN] = { ALU_OP2_MIN_DX10, tgsi_op2},
>>>>> + [TGSI_OPCODE_MAX] = { ALU_OP2_MAX_DX10, tgsi_op2},
>>>>> [TGSI_OPCODE_SLT] = { ALU_OP2_SETGT, tgsi_op2_swap},
>>>>> [TGSI_OPCODE_SGE] = { ALU_OP2_SETGE, tgsi_op2},
>>>>> [TGSI_OPCODE_MAD] = { ALU_OP3_MULADD_IEEE, tgsi_op3},
>>>>> diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp b/src/gallium/drivers/r600/sb/sb_expr.cpp
>>>>> index 3dd3a4815b..7a5d62c8e8 100644
>>>>> --- a/src/gallium/drivers/r600/sb/sb_expr.cpp
>>>>> +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
>>>>> @@ -753,7 +753,9 @@ bool expr_handler::fold_alu_op2(alu_node& n) {
>>>>> n.bc.src[0].abs == n.bc.src[1].abs) {
>>>>> switch (n.bc.op) {
>>>>> case ALU_OP2_MIN: // (MIN x, x) => (MOV x)
>>>>> + case ALU_OP2_MIN_DX10:
>>>>> case ALU_OP2_MAX:
>>>>> + case ALU_OP2_MAX_DX10:
>>>>> convert_to_mov(n, v0, n.bc.src[0].neg, n.bc.src[0].abs);
>>>>> return fold_alu_op1(n);
>>>>> case ALU_OP2_ADD: // (ADD x, x) => (MUL x, 2)
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>>
>>
More information about the mesa-dev
mailing list