[Mesa-dev] NaN behavior in GLSL (was Re: [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs()))

Fri Jan 13 02:13:47 UTC 2017

On Jan 12, 2017 4:56 PM, "Ilia Mirkin" <imirkin at alum.mit.edu> wrote:

On Thu, Jan 12, 2017 at 7:46 PM, Matt Turner <mattst88 at gmail.com> wrote:
> On Thu, Jan 12, 2017 at 3:20 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> On Thu, Jan 12, 2017 at 6:04 PM, Nicolai Hähnle <nhaehnle at gmail.com>
wrote:
>>> On 12.01.2017 23:46, Ilia Mirkin wrote:
>>>>
>>>> On Thu, Jan 12, 2017 at 4:03 PM, Matteo Bruni <matteo.mystral at gmail.com
>
>>>> wrote:
>>>>>
>>>>> So, what would be really nice to have is a GLSL extension for some
>>>>> kind of switch to select the requested behavior WRT NaN. For example a
>>>>> three-way option with "don't generate NaN in arithmetic operations",
>>>>> "do generate NaN" and "don't care". It could also be a GL state if
>>>>> that's easier to implement with the existing hardware, since an
>>>>> individual application isn't supposed to require different behavior
>>>>> from one shader to the next.
>>>>>
>>>>> Is anyone interested in / favorable to something like this? It would
>>>>> solve the issue with defining NaN behavior in GLSL while making things
>>>>> a bit more compatible with "other API a lot of games are ported from
>>>>> which happens to be supported by all the desktop GPUs".
>>>>
>>>>
>>>> Not that I'm biased, but on the NVIDIA Tesla series (G80-GT21x), this
>>>> enable is handled via a global flag, not in the shader binary, so this
>>>> is all-or-nothing for a whole pipeline. On GF100+, I believe there is
>>>> also an enable via a global flag, but there are also a FMUL.FMZ (and
>>>> FFMA.FMZ) flag, which I *think* has the same effect. So for GF100+ hw,
>>>> this could be done at the instruction level.
>>>
>>>
>>> Well, I would also have advocated for what is effectively a
>>> per-program/pipeline flag anyway, even though GCN hardware can
theoretically
>>> do it per-instruction. Tracking a per-instruction bit in the compiler
>>> quickly becomes fragile (e.g. there's no good way for us to model this
>>> information per-instruction in LLVM IR). Per-shader isn't any better
than
>>> per-instruction due to linking, and per-shader-stage is awkward if we
ever
>>> want to do fancier cross-stage optimizations.
>>>
>>> It's really quite simple. Introduce an extension with a name like
>>> MESA_shader_float_dx9. The behavior I'd suggest is:
>>>
>>> Enabling/requiring the extension in a shader causes various semantics
>>> changes to bring floating point behavior in line with DX9 in that
shader's
>>> code:
>>>
>>> - 0*x = 0
>>
>> Yes. But only for fp32, not for fp64.
>>
>>> - sqrt/rsqrt are guaranteed to take the absolute value of their argument
>>
>> Is that necessary? If the software knows about the ext, it also knows
>> to stick the abs() in.
>
> Is there a compelling reason to make the extension offer just one of
> these many behavior differences?
>
> FWIW, i965 has IEEE and "ALT" floating-point modes. ALT, I think
> corresponds to d3d9 behavior, and its description says
>
> A floating-point execution mode that maps +/- inf to +/- fmax, +/-
> denorm to +/-0, and NaN to +0 at the FPU inputs and never produces
> infinities, denormals, or NaN values as outputs.

Interesting. I believe on NVIDIA hardware, it's just float multiply
that's affected.

>
> Also: Extended mathematics functions of log(), rsq() and sqrt() take
> the absolute value of the sources before computation to avoid
> generating INF and NaN results.
>
> If those two behaviors correspond to d3d9 behavior, I wouldn't want an
> extension that offered only the "zero wins" behavior and expected
> applications to insert abs().

Really? That creates ARB_gpu_shader5-style extensions which do 75
different things and that you can't expose if you can only do 74 of
them.

I understand your concern but what hardware are we planning to expose this
on that can't do d3d9?  It seems like there are three things here:  inf/NaN
handling, denorm flushing, and abs() on special functions.  I suppose I'd
be OK with separating things out a bit but we need to have a single enable
or else our hatdware is going to have serious problems with it.

I think in the past we've avoided things like having "d3d9 mode"
in gallium API's - it's nice for these things to be individually
enumerated. I like the direction that e.g. ARB_clip_control went in -
make it all configurable individually instead of bundling unrelated
things together. This has allowed e.g. dolphin to do things in OpenGL
that are impossible on DX. And whether 0 * x = 0 or not seems rather
unrelated from whether rsq takes abs of its args.

Unless, of course, it's controlled by the same hardware bit... Clearly, we
can can give you abs on rsq without denorm flushing (easy shader hacks) but not
the other way around.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170112/fa2cfb21/attachment.html>