[Mesa-dev] NaN behavior in GLSL (was Re: [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs()))

Ilia Mirkin imirkin at alum.mit.edu
Fri Jan 13 00:56:22 UTC 2017


On Thu, Jan 12, 2017 at 7:46 PM, Matt Turner <mattst88 at gmail.com> wrote:
> On Thu, Jan 12, 2017 at 3:20 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> On Thu, Jan 12, 2017 at 6:04 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>>> On 12.01.2017 23:46, Ilia Mirkin wrote:
>>>>
>>>> On Thu, Jan 12, 2017 at 4:03 PM, Matteo Bruni <matteo.mystral at gmail.com>
>>>> wrote:
>>>>>
>>>>> So, what would be really nice to have is a GLSL extension for some
>>>>> kind of switch to select the requested behavior WRT NaN. For example a
>>>>> three-way option with "don't generate NaN in arithmetic operations",
>>>>> "do generate NaN" and "don't care". It could also be a GL state if
>>>>> that's easier to implement with the existing hardware, since an
>>>>> individual application isn't supposed to require different behavior
>>>>> from one shader to the next.
>>>>>
>>>>> Is anyone interested in / favorable to something like this? It would
>>>>> solve the issue with defining NaN behavior in GLSL while making things
>>>>> a bit more compatible with "other API a lot of games are ported from
>>>>> which happens to be supported by all the desktop GPUs".
>>>>
>>>>
>>>> Not that I'm biased, but on the NVIDIA Tesla series (G80-GT21x), this
>>>> enable is handled via a global flag, not in the shader binary, so this
>>>> is all-or-nothing for a whole pipeline. On GF100+, I believe there is
>>>> also an enable via a global flag, but there are also a FMUL.FMZ (and
>>>> FFMA.FMZ) flag, which I *think* has the same effect. So for GF100+ hw,
>>>> this could be done at the instruction level.
>>>
>>>
>>> Well, I would also have advocated for what is effectively a
>>> per-program/pipeline flag anyway, even though GCN hardware can theoretically
>>> do it per-instruction. Tracking a per-instruction bit in the compiler
>>> quickly becomes fragile (e.g. there's no good way for us to model this
>>> information per-instruction in LLVM IR). Per-shader isn't any better than
>>> per-instruction due to linking, and per-shader-stage is awkward if we ever
>>> want to do fancier cross-stage optimizations.
>>>
>>> It's really quite simple. Introduce an extension with a name like
>>> MESA_shader_float_dx9. The behavior I'd suggest is:
>>>
>>> Enabling/requiring the extension in a shader causes various semantics
>>> changes to bring floating point behavior in line with DX9 in that shader's
>>> code:
>>>
>>> - 0*x = 0
>>
>> Yes. But only for fp32, not for fp64.
>>
>>> - sqrt/rsqrt are guaranteed to take the absolute value of their argument
>>
>> Is that necessary? If the software knows about the ext, it also knows
>> to stick the abs() in.
>
> Is there a compelling reason to make the extension offer just one of
> these many behavior differences?
>
> FWIW, i965 has IEEE and "ALT" floating-point modes. ALT, I think
> corresponds to d3d9 behavior, and its description says
>
> A floating-point execution mode that maps +/- inf to +/- fmax, +/-
> denorm to +/-0, and NaN to +0 at the FPU inputs and never produces
> infinities, denormals, or NaN values as outputs.

Interesting. I believe on NVIDIA hardware, it's just float multiply
that's affected.

>
> Also: Extended mathematics functions of log(), rsq() and sqrt() take
> the absolute value of the sources before computation to avoid
> generating INF and NaN results.
>
> If those two behaviors correspond to d3d9 behavior, I wouldn't want an
> extension that offered only the "zero wins" behavior and expected
> applications to insert abs().

Really? That creates ARB_gpu_shader5-style extensions which do 75
different things and that you can't expose if you can only do 74 of
them. I think in the past we've avoided things like having "d3d9 mode"
in gallium API's - it's nice for these things to be individually
enumerated. I like the direction that e.g. ARB_clip_control went in -
make it all configurable individually instead of bundling unrelated
things together. This has allowed e.g. dolphin to do things in OpenGL
that are impossible on DX. And whether 0 * x = 0 or not seems rather
unrelated from whether rsq takes abs of its args.

  -ilia


More information about the mesa-dev mailing list