[Mesa-dev] NaN behavior in GLSL (was Re: [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs()))

Thu Jan 12 23:43:59 UTC 2017

On 13.01.2017 00:20, Ilia Mirkin wrote:
> On Thu, Jan 12, 2017 at 6:04 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>> On 12.01.2017 23:46, Ilia Mirkin wrote:
>>>
>>> On Thu, Jan 12, 2017 at 4:03 PM, Matteo Bruni <matteo.mystral at gmail.com>
>>> wrote:
>>>>
>>>> So, what would be really nice to have is a GLSL extension for some
>>>> kind of switch to select the requested behavior WRT NaN. For example a
>>>> three-way option with "don't generate NaN in arithmetic operations",
>>>> "do generate NaN" and "don't care". It could also be a GL state if
>>>> that's easier to implement with the existing hardware, since an
>>>> individual application isn't supposed to require different behavior
>>>> from one shader to the next.
>>>>
>>>> Is anyone interested in / favorable to something like this? It would
>>>> solve the issue with defining NaN behavior in GLSL while making things
>>>> a bit more compatible with "other API a lot of games are ported from
>>>> which happens to be supported by all the desktop GPUs".
>>>
>>>
>>> Not that I'm biased, but on the NVIDIA Tesla series (G80-GT21x), this
>>> enable is handled via a global flag, not in the shader binary, so this
>>> is all-or-nothing for a whole pipeline. On GF100+, I believe there is
>>> also an enable via a global flag, but there are also a FMUL.FMZ (and
>>> FFMA.FMZ) flag, which I *think* has the same effect. So for GF100+ hw,
>>> this could be done at the instruction level.
>>
>>
>> Well, I would also have advocated for what is effectively a
>> per-program/pipeline flag anyway, even though GCN hardware can theoretically
>> do it per-instruction. Tracking a per-instruction bit in the compiler
>> quickly becomes fragile (e.g. there's no good way for us to model this
>> information per-instruction in LLVM IR). Per-shader isn't any better than
>> per-instruction due to linking, and per-shader-stage is awkward if we ever
>> want to do fancier cross-stage optimizations.
>>
>> It's really quite simple. Introduce an extension with a name like
>> MESA_shader_float_dx9. The behavior I'd suggest is:
>>
>> Enabling/requiring the extension in a shader causes various semantics
>> changes to bring floating point behavior in line with DX9 in that shader's
>> code:
>>
>> - 0*x = 0
>
> Yes. But only for fp32, not for fp64.
>
>> - sqrt/rsqrt are guaranteed to take the absolute value of their argument
>
> Is that necessary? If the software knows about the ext, it also knows
> to stick the abs() in.
>
>> - anything else?
>
> I'd say MESA_shader_float_zero_wins or something, if we don't stick
> the sqrt stuff into it.

Well, I don't know the intricacies of DX9. I agree that apps can do the 
abs() themselves, so if the 0*x behavior is really the only other 
difference, then zero_wins is a fine name as well.

Cheers,
Nicolai

>
> Here is a software model of the Tesla-era shader execution created by
> Marcin (mwk):
>
> https://github.com/envytools/envytools/blob/master/nvhw/fp.c#L168
>
> The bit in question is "zero_wins", so just look at what that
> modifies. So if you have a*b and a || b == 0 (i.e. they are +0 or -0)
> then +0 is returned.
>
>>
>> It is a link error to link a program in which some shaders have the
>> extension and others don't.
>>
>> There's funny interactions like having compute shaders with dx9 float
>> semantics, but that's pretty much it :)
>>
>> Somebody just needs to write up a draft, but it only makes sense if we can
>> at least get all the Mesa drivers and Wine behind this. If there's a simple
>> global flag on NVidia hardware, then it should be easy to provide an initial
>> implementation for nouveau *hint* *hint* ;)
>>
>> For radeonsi, more work is required (like support in LLVM).
>
> All sounds good to me. Should be relatively straightforward for nouveau.
>