[Mesa-dev] NaN behavior in GLSL (was Re: [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs()))

Marek Olšák maraeo at gmail.com
Fri Jan 13 00:11:16 UTC 2017


On Fri, Jan 13, 2017 at 12:43 AM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> On 13.01.2017 00:20, Ilia Mirkin wrote:
>>
>> On Thu, Jan 12, 2017 at 6:04 PM, Nicolai Hähnle <nhaehnle at gmail.com>
>> wrote:
>>>
>>> On 12.01.2017 23:46, Ilia Mirkin wrote:
>>>>
>>>>
>>>> On Thu, Jan 12, 2017 at 4:03 PM, Matteo Bruni <matteo.mystral at gmail.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> So, what would be really nice to have is a GLSL extension for some
>>>>> kind of switch to select the requested behavior WRT NaN. For example a
>>>>> three-way option with "don't generate NaN in arithmetic operations",
>>>>> "do generate NaN" and "don't care". It could also be a GL state if
>>>>> that's easier to implement with the existing hardware, since an
>>>>> individual application isn't supposed to require different behavior
>>>>> from one shader to the next.
>>>>>
>>>>> Is anyone interested in / favorable to something like this? It would
>>>>> solve the issue with defining NaN behavior in GLSL while making things
>>>>> a bit more compatible with "other API a lot of games are ported from
>>>>> which happens to be supported by all the desktop GPUs".
>>>>
>>>>
>>>>
>>>> Not that I'm biased, but on the NVIDIA Tesla series (G80-GT21x), this
>>>> enable is handled via a global flag, not in the shader binary, so this
>>>> is all-or-nothing for a whole pipeline. On GF100+, I believe there is
>>>> also an enable via a global flag, but there are also a FMUL.FMZ (and
>>>> FFMA.FMZ) flag, which I *think* has the same effect. So for GF100+ hw,
>>>> this could be done at the instruction level.
>>>
>>>
>>>
>>> Well, I would also have advocated for what is effectively a
>>> per-program/pipeline flag anyway, even though GCN hardware can
>>> theoretically
>>> do it per-instruction. Tracking a per-instruction bit in the compiler
>>> quickly becomes fragile (e.g. there's no good way for us to model this
>>> information per-instruction in LLVM IR). Per-shader isn't any better than
>>> per-instruction due to linking, and per-shader-stage is awkward if we
>>> ever
>>> want to do fancier cross-stage optimizations.
>>>
>>> It's really quite simple. Introduce an extension with a name like
>>> MESA_shader_float_dx9. The behavior I'd suggest is:
>>>
>>> Enabling/requiring the extension in a shader causes various semantics
>>> changes to bring floating point behavior in line with DX9 in that
>>> shader's
>>> code:
>>>
>>> - 0*x = 0
>>
>>
>> Yes. But only for fp32, not for fp64.
>>
>>> - sqrt/rsqrt are guaranteed to take the absolute value of their argument
>>
>>
>> Is that necessary? If the software knows about the ext, it also knows
>> to stick the abs() in.
>>
>>> - anything else?
>>
>>
>> I'd say MESA_shader_float_zero_wins or something, if we don't stick
>> the sqrt stuff into it.
>
>
> Well, I don't know the intricacies of DX9. I agree that apps can do the
> abs() themselves, so if the 0*x behavior is really the only other
> difference, then zero_wins is a fine name as well.
>
> Cheers,
> Nicolai
>
>
>>
>> Here is a software model of the Tesla-era shader execution created by
>> Marcin (mwk):
>>
>> https://github.com/envytools/envytools/blob/master/nvhw/fp.c#L168
>>
>> The bit in question is "zero_wins", so just look at what that
>> modifies. So if you have a*b and a || b == 0 (i.e. they are +0 or -0)
>> then +0 is returned.
>>
>>>
>>> It is a link error to link a program in which some shaders have the
>>> extension and others don't.
>>>
>>> There's funny interactions like having compute shaders with dx9 float
>>> semantics, but that's pretty much it :)
>>>
>>> Somebody just needs to write up a draft, but it only makes sense if we
>>> can
>>> at least get all the Mesa drivers and Wine behind this. If there's a
>>> simple
>>> global flag on NVidia hardware, then it should be easy to provide an
>>> initial
>>> implementation for nouveau *hint* *hint* ;)
>>>
>>> For radeonsi, more work is required (like support in LLVM).
>>
>>
>> All sounds good to me. Should be relatively straightforward for nouveau.

Wine can also do what Nine does. Its RSQ implementation is:
min(FLT_MAX, rsq(abs(x)))

That min() expression gets rid of +inf and also NaNs, because a
non-NaN number wins (here FLT_MAX) on radeonsi.

Marek


More information about the mesa-dev mailing list