[Mesa-dev] NaN behavior in GLSL (was Re: [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs()))

Sat Jan 14 22:47:33 UTC 2017

On 13/01/2017 19:06, Nicolai Hähnle wrote:
> On 13.01.2017 18:53, Jason Ekstrand wrote:
>> On Fri, Jan 13, 2017 at 8:43 AM, Marek Olšák <maraeo at gmail.com
>> <mailto:maraeo at gmail.com>> wrote:
>>
>>     On Fri, Jan 13, 2017 at 5:25 PM, Jason Ekstrand
>>     <jason at jlekstrand.net <mailto:jason at jlekstrand.net>> wrote:
>>     > On Fri, Jan 13, 2017 at 4:05 AM, Marek Olšák <maraeo at gmail.com
>>     <mailto:maraeo at gmail.com>> wrote:
>>     >>
>>     >> On Fri, Jan 13, 2017 at 3:37 AM, Ilia Mirkin
>>     <imirkin at alum.mit.edu <mailto:imirkin at alum.mit.edu>> wrote:
>>     >> > On Thu, Jan 12, 2017 at 9:13 PM, Jason Ekstrand
>>     <jason at jlekstrand.net <mailto:jason at jlekstrand.net>>
>>     >> > wrote:
>>     >> >> Unless, of course, it's controlled by the same hardware bit...
>>     Clearly,
>>     >> >> we
>>     >> >> can can give you abs on rsq without denorm flushing (easy
>>     shader hacks)
>>     >> >> but
>>     >> >> not the other way around.
>>     >> >
>>     >> > OK, so somehow I missed that earlier. However there's an
>>     interesting
>>     >> > section in the PRM:
>>     >> >
>>     >> >
>>     >> >
>> https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol07-3d_media_gpgpu.pdf
>> <https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol07-3d_media_gpgpu.pdf>
>>     >> >
>>     >> > on PDF page 854, "Dismissed Legacy Behaviors" which has a 
>> list of
>>     >> > suggested IEEE 754 deviations for DX9. One of them is indeed
>>     that 0 *
>>     >> > x = 0, but another is that input NaNs be propagated with 
>> certain
>>     >> > exceptions. Also they suggest that RCP(0)/RSQ(0) = fmax.
>>     Interesting.
>>     >> >
>>     >> > So at this point, the zero_wins thing is pretty much blown. 
>> i965
>>     >> > appears to have an all-or-nothing approach, and additionally 
>> that
>>     >> > approach doesn't match up exactly to what NVIDIA does (or at
>>     least I'm
>>     >> > not aware of a clamp-everything mode).
>>     >> >
>>     >> > This will take some thought to figure out how something can be
>>     >> > specified so that a single spec works for both i965 and nv/amd.
>>     OTOH
>>     >> > we could have two different specs that just expose different
>>     things -
>>     >> > e.g. i965 could expose a MESA_shader_float_alt_mode or whatever
>>     which
>>     >> > is spec'd to do the things that the PRM says, and nv/amd 
>> have the
>>     >> > MESA_shader_float_zero_wins ext which does what we were talking
>>     about
>>     >> > earlier.
>>     >> >
>>     >> > I'm open to other suggestions too.
>>     >>
>>     >> There is also the "small" problem that it would take a 
>> non-trivial
>>     >> effort for us on the LLVM side. You guys can flip a switch. We 
>> can't.
>>     >
>>     >
>>     > Don't you have to expend that effort for ARB programs anyway?  I
>>     thought
>>     > they weren't supposed to generate NaN either.
>>
>>     No, we don't, because st/mesa adds abs before RSQ and the driver
>>     implements POW as log+mul+exp, where mul follows the rule
>>     0*anything=0. I don't think any other opcode follows that rule 
>> though.
>>
>>
>> Ah.  That makes sense.  Do you also implement DIV as MUL+RCP?
>
> For single-precision, yes. For double-precision, it seems we need to 
> move away from that due to precision issues (which is itself a bit 
> odd, since you don't seem to have encountered that?).
>
> Nicolai
>
>>  If so,
>> the two of those should take care of NaN getting generated in the
>> shader.  We'd still have to do something about inf and maybe denorms.
>

I did some tests on Ivy Bridge and amd 7730m on Windows 10.

======= The tests ========

With a sm3 pixel shader (writing to a fp32 render target and reading the 
result):

Intel: Things seem to match the ALT mode (log, rcp, rsq clamped. No NaN 
generated except if using a NaN constant as input.)

Amd: log is clamped, rcp and rsq do produce INF. NaN is propagated.

Matteo did test on NVidia: log, rcp and rsq do produce INF. NaN is 
propagated.

Common to all cards:

0*NaN/Inf/-Inf = 0

nrm(inf, inf, inf) = (0, 0, 0) (probably comes from 0*anything = 0)

I tested the same thing with a sm2 pixel shader, and the results were 
not affected on Intel and Amd.

Adapting wine initial fp_special_test to test what happens in vertex 
shaders (the output is not written to fp32 render target, instead 3 
unorm values are produced to try to guess what happens, so results are 
harder to interprete, and the following may have some errors).

Intel: log, rcp and rsq are clamped. One of the unorm values changes 
when using vs/ps 3 instead of vs/ps 2 for the part of the test where the 
vs outputs a shader constant containing NaN or Inf, thus perhaps there 
is a slight change there in the rasterizer behaviour. The results are 
not enough to deduce whether the ALT mode is used or not.

Amd: log is clamped, rcp and rsq are clamped when using vs 2 and produce 
INF when using vs 3. 0 * rcp(0) = 0 * rsq(0) = 0.

The filled nvidia results for the vs 2 version of the test say log, rcp 
and rsq are not clamped.

The fact 0*inf = 0 instead of NaN contradicts the r500 docs and the 
geforce 6 docs.

The intel, r500 and geforce 6 docs seem to indicate there is also 
specific NaN behaviours with CMP, MIN and MAX.

For MIN and MAX, it is written in case one of the two terms in NaN, the 
second term is always returned (whereas apparently for dx10 it is the 
non-NaN term). I haven't done tests for MIN/MAX.

====== Conclusion ======

There seems to be a lot of variations between what vendors do. It seems 
either having access to the ALT mode or having access to the 
0*anything=0 would make wine and nine happy.

The ALT mode sounds like something that can be emulated (basically 
clamping everything that can produce inf), but I think some apps are not 
happy with that (Please confirm Matteo ?), perhaps the unknown things 
intel seems to do different for vs2 and vs3, or the other specific 
things the ALT mode does, do fix these apps. It probably is still 
interesting for wine and nine to have these (ALT and 0*inf = 0) as 
extensions.

It probably is ok to not specify the behaviours around NaN.
With 0*anything = 0, the only way to have NaN is either to feed NaN via 
the vertex inputs or the constants (something which the intel spec says 
is forbiden in dx9), or to have inf in a vertex shader output (it 
becomes NaN as pixel shader input). If some games are hit by a problem 
due to that, it probably can be fixed by app workaround doing more clamping.
That said, apps that require 0*inf = 0 could also be fixed by app 
workaround doing more clamping and avoiding inf generation.

Since it seems easy to have an intel ALT mode extension and an 
0*(+-inf)=0 extension, I would think it is a good idea to have them, but 
what the tests show is that they may not be required.

Axel