[Mesa-dev] NaN behavior in GLSL (was Re: [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs()))

Fri Jan 13 00:46:15 UTC 2017

On Thu, Jan 12, 2017 at 3:20 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> On Thu, Jan 12, 2017 at 6:04 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>> On 12.01.2017 23:46, Ilia Mirkin wrote:
>>>
>>> On Thu, Jan 12, 2017 at 4:03 PM, Matteo Bruni <matteo.mystral at gmail.com>
>>> wrote:
>>>>
>>>> So, what would be really nice to have is a GLSL extension for some
>>>> kind of switch to select the requested behavior WRT NaN. For example a
>>>> three-way option with "don't generate NaN in arithmetic operations",
>>>> "do generate NaN" and "don't care". It could also be a GL state if
>>>> that's easier to implement with the existing hardware, since an
>>>> individual application isn't supposed to require different behavior
>>>> from one shader to the next.
>>>>
>>>> Is anyone interested in / favorable to something like this? It would
>>>> solve the issue with defining NaN behavior in GLSL while making things
>>>> a bit more compatible with "other API a lot of games are ported from
>>>> which happens to be supported by all the desktop GPUs".
>>>
>>>
>>> Not that I'm biased, but on the NVIDIA Tesla series (G80-GT21x), this
>>> enable is handled via a global flag, not in the shader binary, so this
>>> is all-or-nothing for a whole pipeline. On GF100+, I believe there is
>>> also an enable via a global flag, but there are also a FMUL.FMZ (and
>>> FFMA.FMZ) flag, which I *think* has the same effect. So for GF100+ hw,
>>> this could be done at the instruction level.
>>
>>
>> Well, I would also have advocated for what is effectively a
>> per-program/pipeline flag anyway, even though GCN hardware can theoretically
>> do it per-instruction. Tracking a per-instruction bit in the compiler
>> quickly becomes fragile (e.g. there's no good way for us to model this
>> information per-instruction in LLVM IR). Per-shader isn't any better than
>> per-instruction due to linking, and per-shader-stage is awkward if we ever
>> want to do fancier cross-stage optimizations.
>>
>> It's really quite simple. Introduce an extension with a name like
>> MESA_shader_float_dx9. The behavior I'd suggest is:
>>
>> Enabling/requiring the extension in a shader causes various semantics
>> changes to bring floating point behavior in line with DX9 in that shader's
>> code:
>>
>> - 0*x = 0
>
> Yes. But only for fp32, not for fp64.
>
>> - sqrt/rsqrt are guaranteed to take the absolute value of their argument
>
> Is that necessary? If the software knows about the ext, it also knows
> to stick the abs() in.

Is there a compelling reason to make the extension offer just one of
these many behavior differences?

FWIW, i965 has IEEE and "ALT" floating-point modes. ALT, I think
corresponds to d3d9 behavior, and its description says

A floating-point execution mode that maps +/- inf to +/- fmax, +/-
denorm to +/-0, and NaN to +0 at the FPU inputs and never produces
infinities, denormals, or NaN values as outputs.

Also: Extended mathematics functions of log(), rsq() and sqrt() take
the absolute value of the sources before computation to avoid
generating INF and NaN results.

If those two behaviors correspond to d3d9 behavior, I wouldn't want an
extension that offered only the "zero wins" behavior and expected
applications to insert abs().

Other things to note: ALT mode is not supported with half or double
precision, and