[Mesa-dev] NaN behavior in GLSL (was Re: [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs()))

Fri Jan 13 01:21:13 UTC 2017

On Thu, Jan 12, 2017 at 4:56 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> On Thu, Jan 12, 2017 at 7:46 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> On Thu, Jan 12, 2017 at 3:20 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>> On Thu, Jan 12, 2017 at 6:04 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>>>> On 12.01.2017 23:46, Ilia Mirkin wrote:
>>>>>
>>>>> On Thu, Jan 12, 2017 at 4:03 PM, Matteo Bruni <matteo.mystral at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> So, what would be really nice to have is a GLSL extension for some
>>>>>> kind of switch to select the requested behavior WRT NaN. For example a
>>>>>> three-way option with "don't generate NaN in arithmetic operations",
>>>>>> "do generate NaN" and "don't care". It could also be a GL state if
>>>>>> that's easier to implement with the existing hardware, since an
>>>>>> individual application isn't supposed to require different behavior
>>>>>> from one shader to the next.
>>>>>>
>>>>>> Is anyone interested in / favorable to something like this? It would
>>>>>> solve the issue with defining NaN behavior in GLSL while making things
>>>>>> a bit more compatible with "other API a lot of games are ported from
>>>>>> which happens to be supported by all the desktop GPUs".
>>>>>
>>>>>
>>>>> Not that I'm biased, but on the NVIDIA Tesla series (G80-GT21x), this
>>>>> enable is handled via a global flag, not in the shader binary, so this
>>>>> is all-or-nothing for a whole pipeline. On GF100+, I believe there is
>>>>> also an enable via a global flag, but there are also a FMUL.FMZ (and
>>>>> FFMA.FMZ) flag, which I *think* has the same effect. So for GF100+ hw,
>>>>> this could be done at the instruction level.
>>>>
>>>>
>>>> Well, I would also have advocated for what is effectively a
>>>> per-program/pipeline flag anyway, even though GCN hardware can theoretically
>>>> do it per-instruction. Tracking a per-instruction bit in the compiler
>>>> quickly becomes fragile (e.g. there's no good way for us to model this
>>>> information per-instruction in LLVM IR). Per-shader isn't any better than
>>>> per-instruction due to linking, and per-shader-stage is awkward if we ever
>>>> want to do fancier cross-stage optimizations.
>>>>
>>>> It's really quite simple. Introduce an extension with a name like
>>>> MESA_shader_float_dx9. The behavior I'd suggest is:
>>>>
>>>> Enabling/requiring the extension in a shader causes various semantics
>>>> changes to bring floating point behavior in line with DX9 in that shader's
>>>> code:
>>>>
>>>> - 0*x = 0
>>>
>>> Yes. But only for fp32, not for fp64.
>>>
>>>> - sqrt/rsqrt are guaranteed to take the absolute value of their argument
>>>
>>> Is that necessary? If the software knows about the ext, it also knows
>>> to stick the abs() in.
>>
>> Is there a compelling reason to make the extension offer just one of
>> these many behavior differences?
>>
>> FWIW, i965 has IEEE and "ALT" floating-point modes. ALT, I think
>> corresponds to d3d9 behavior, and its description says
>>
>> A floating-point execution mode that maps +/- inf to +/- fmax, +/-
>> denorm to +/-0, and NaN to +0 at the FPU inputs and never produces
>> infinities, denormals, or NaN values as outputs.
>
> Interesting. I believe on NVIDIA hardware, it's just float multiply
> that's affected.
>
>>
>> Also: Extended mathematics functions of log(), rsq() and sqrt() take
>> the absolute value of the sources before computation to avoid
>> generating INF and NaN results.
>>
>> If those two behaviors correspond to d3d9 behavior, I wouldn't want an
>> extension that offered only the "zero wins" behavior and expected
>> applications to insert abs().
>
> Really? That creates ARB_gpu_shader5-style extensions which do 75
> different things and that you can't expose if you can only do 74 of
> them. I think in the past we've avoided things like having "d3d9 mode"
> in gallium API's - it's nice for these things to be individually
> enumerated. I like the direction that e.g. ARB_clip_control went in -
> make it all configurable individually instead of bundling unrelated
> things together. This has allowed e.g. dolphin to do things in OpenGL
> that are impossible on DX. And whether 0 * x = 0 or not seems rather
> unrelated from whether rsq takes abs of its args.

Definitely agree. Sorry about i965 :)

I think we should figure out what behaviors D3D9 actually wants.
i965's ALT mode maps ±inf as ±fmax on input. If D3D9 wants that... we
should probably include it in the spec.

Also, if the extension is written in a way that isn't doable on i965 I
think we're just wasting time. i965 is the only driver that cannot use
st/nine. :)