[Mesa-dev] [PATCH] radeonsi: enable 32-bit denormals on VI+

Thu Jan 12 16:35:44 UTC 2017

Am 12.01.2017 um 17:20 schrieb Ilia Mirkin:
> On Thu, Jan 12, 2017 at 11:13 AM, Roland Scheidegger <sroland at vmware.com> wrote:
>> Am 12.01.2017 um 02:12 schrieb Marek Olšák:
>>> On Thu, Jan 12, 2017 at 12:33 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>> On Wed, Jan 11, 2017 at 4:00 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>>>>> Am 11.01.2017 um 21:08 schrieb Samuel Pitoiset:
>>>>>>
>>>>>>
>>>>>> On 01/11/2017 07:00 PM, Roland Scheidegger wrote:
>>>>>>> I don't think there's any glsl, es or otherwise, specification which
>>>>>>> would require denorms (since obviously lots of hw can't do it, d3d10
>>>>>>> forbids them), with any precision qualifier. Hence these look like bugs
>>>>>>> of the test suite to me?
>>>>>>> (Irrespective if it's a good idea or not to enable denormals, which I
>>>>>>> don't realy know.)
>>>>>>
>>>>>> That test works on NVIDIA hw (both with blob and nouveau) and IIRC it
>>>>>> also works on Intel hw. I don't think it's buggy there.
>>>>> The question then is why it needs denorms on radeons...
>>>>
>>>> I spent some time with Samuel looking at this. So, this is pretty
>>>> funny... (or at least feels that way after staring at floating point
>>>> for a while)
>>>>
>>>> dEQP is, in fact, feeding denorms to the min/max functions. But it's
>>>> smart enough to know that flushing denorms to 0 is OK, and so it
>>>> treats a 0 as a pass. (And obviously it treats the "right answer" as a
>>>> pass.) So that's why enabling denorm processing fixes it - that causes
>>>> the hw to return the proper correct answer and all is well.
>>>>
>>>> However the issue is that without denorm processing, the hw is
>>>> returning the *wrong* answer. At first I thought that max was being
>>>> lowered into something like
>>>>
>>>> if (a > b) { x = a; } else { x = b; }
>>>>
>>>> which would end up with potentially wrong results if a and b are being
>>>> flushed as inputs into the comparison but not into the assignments.
>>>> But that's not (explicitly) what's happening - the v_max_f32_e32
>>>> instruction is being used. Perhaps that's what it does internally? If
>>>> so, that means that results of affected float functions in LLVM need
>>>> explicit flushing before being stored into results.
>>>>
>>>> FWIW the specific values triggering the issue are:
>>>>
>>>> in0=-0x0.000002p-126, in1=-0x0.fffffep-126, out0=-0x0.fffffep-126 -> FAIL
>>>>
>>>> With denorm processing, it correctly reports out0=-0x0.000002p-126,
>>>> while nouveau with denorm flushing enabled reports out0=0.0 which also
>>>> passes.
>>>
>>> The denorm configuration has 2 bits:
>>> - flush (0) or allow (1) input denorms
>>> - flush (0) or allow (1) output denorms
>>>
>>> In the case of v_max, it looks like output denorms are not flushed and
>>> it behaves almost like you said:
>>>
>>> if (a >= b) { x = a; } else { x = b; }
>>>
>>> Marek
>>>
>>
>> I think this makes perfect sense. Usually output denorms are always
>> flushed but this often doesn't affect "mov-like" operations, only if
>> it's some arithmetic operation. And in this case you have a comparison
>> plus a cmov.
>> FWIW I believe you'd get the same result on x86 sse, if denorm inputs as
>> zero (DAZ) is set and denorm output flush is enabled (FTZ), such as with
>> llvmpipe. The comparison will think both values are the same, but since
>> there's no arithmetic involved for the selection there will be no denorm
>> flush, albeit it's not obvious from the manual, since it doesn't
>> actually mention if DAZ is honored, but I'd think it is just like with
>> all float operations: "IF ((SRC1 == 0.0) and (SRC2 == 0.0)) THEN DEST =
>> SRC2;" That behavior may be alright with d3d10 even, though it's far
>> from obvious neither, but in any case I didn't see problems due to that
>> with conformance tests ("Min or max operations flush denorms for
>> comparison, but the result may or may not be denorm flushed." -
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__msdn.microsoft.com_en-2Dus_library_windows_desktop_cc308050-28v-3Dvs.85-29.aspx&d=DwIFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=bWmrxo0hrLJnWT7VDWX1uEHl8D7ORT2rwcaHH7GuizY&s=dbfPdFS1QLQ-6lWJ_zXTzrZ8qRR1oLJm8z51nezqoSE&e= )
>> )
>>
>> Doesn't the hw do denorm flushing on shader export though or does the
>> test get back the result via some other means?
> 
> In this case it's written to a SSBO (UAV in D3D11-speak I believe), so
> it's just a straight up untyped store.
> 
>>
>> In any case, probably no big deal - since the hw won't generate denorms
>> on its own, they can only be passed in, and apps are unlikely to do that
>> (and even if they do, a wrong answer probably shouldn't really hurt,
>> since it's just a wrong denorm which generally won't matter).
> 
> Yeah, I think for most situations it should be fine. This is just one
> of the weird ones.
> 
> Given your comments, I think the dEQP floating point testing logic
> should be modified to accept *either* answer if both are denorms for
> functions like min/max.

Yes, that looks like the right solution to me. Actually min/max are
defined as "Returns y if y < x; otherwise it returns x" without any
further comments - and you'd get exactly the same issue if you'd do that
sequence as such without denorm support (at least I'm quite sure a wrong
denorm is acceptable there).
So, as I said initially - error in test suite :-).

Roland

> Cheers,
> 
>   -ilia
>