[Mesa-dev] [PATCH] radeonsi: enable 32-bit denormals on VI+

Thu Jan 12 16:13:55 UTC 2017

Am 12.01.2017 um 02:12 schrieb Marek Olšák:
> On Thu, Jan 12, 2017 at 12:33 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> On Wed, Jan 11, 2017 at 4:00 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>>> Am 11.01.2017 um 21:08 schrieb Samuel Pitoiset:
>>>>
>>>>
>>>> On 01/11/2017 07:00 PM, Roland Scheidegger wrote:
>>>>> I don't think there's any glsl, es or otherwise, specification which
>>>>> would require denorms (since obviously lots of hw can't do it, d3d10
>>>>> forbids them), with any precision qualifier. Hence these look like bugs
>>>>> of the test suite to me?
>>>>> (Irrespective if it's a good idea or not to enable denormals, which I
>>>>> don't realy know.)
>>>>
>>>> That test works on NVIDIA hw (both with blob and nouveau) and IIRC it
>>>> also works on Intel hw. I don't think it's buggy there.
>>> The question then is why it needs denorms on radeons...
>>
>> I spent some time with Samuel looking at this. So, this is pretty
>> funny... (or at least feels that way after staring at floating point
>> for a while)
>>
>> dEQP is, in fact, feeding denorms to the min/max functions. But it's
>> smart enough to know that flushing denorms to 0 is OK, and so it
>> treats a 0 as a pass. (And obviously it treats the "right answer" as a
>> pass.) So that's why enabling denorm processing fixes it - that causes
>> the hw to return the proper correct answer and all is well.
>>
>> However the issue is that without denorm processing, the hw is
>> returning the *wrong* answer. At first I thought that max was being
>> lowered into something like
>>
>> if (a > b) { x = a; } else { x = b; }
>>
>> which would end up with potentially wrong results if a and b are being
>> flushed as inputs into the comparison but not into the assignments.
>> But that's not (explicitly) what's happening - the v_max_f32_e32
>> instruction is being used. Perhaps that's what it does internally? If
>> so, that means that results of affected float functions in LLVM need
>> explicit flushing before being stored into results.
>>
>> FWIW the specific values triggering the issue are:
>>
>> in0=-0x0.000002p-126, in1=-0x0.fffffep-126, out0=-0x0.fffffep-126 -> FAIL
>>
>> With denorm processing, it correctly reports out0=-0x0.000002p-126,
>> while nouveau with denorm flushing enabled reports out0=0.0 which also
>> passes.
> 
> The denorm configuration has 2 bits:
> - flush (0) or allow (1) input denorms
> - flush (0) or allow (1) output denorms
> 
> In the case of v_max, it looks like output denorms are not flushed and
> it behaves almost like you said:
> 
> if (a >= b) { x = a; } else { x = b; }
> 
> Marek
> 

I think this makes perfect sense. Usually output denorms are always
flushed but this often doesn't affect "mov-like" operations, only if
it's some arithmetic operation. And in this case you have a comparison
plus a cmov.
FWIW I believe you'd get the same result on x86 sse, if denorm inputs as
zero (DAZ) is set and denorm output flush is enabled (FTZ), such as with
llvmpipe. The comparison will think both values are the same, but since
there's no arithmetic involved for the selection there will be no denorm
flush, albeit it's not obvious from the manual, since it doesn't
actually mention if DAZ is honored, but I'd think it is just like with
all float operations: "IF ((SRC1 == 0.0) and (SRC2 == 0.0)) THEN DEST =
SRC2;" That behavior may be alright with d3d10 even, though it's far
from obvious neither, but in any case I didn't see problems due to that
with conformance tests ("Min or max operations flush denorms for
comparison, but the result may or may not be denorm flushed." -
https://msdn.microsoft.com/en-us/library/windows/desktop/cc308050(v=vs.85).aspx)
)

Doesn't the hw do denorm flushing on shader export though or does the
test get back the result via some other means?

In any case, probably no big deal - since the hw won't generate denorms
on its own, they can only be passed in, and apps are unlikely to do that
(and even if they do, a wrong answer probably shouldn't really hurt,
since it's just a wrong denorm which generally won't matter).

Roland