[Mesa-dev] NaN behavior in GLSL (was Re: [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs()))

Fri Jan 13 18:50:07 UTC 2017

2017-01-13 3:37 GMT+01:00 Ilia Mirkin <imirkin at alum.mit.edu>:
> On Thu, Jan 12, 2017 at 9:13 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>> Unless, of course, it's controlled by the same hardware bit... Clearly, we
>> can can give you abs on rsq without denorm flushing (easy shader hacks) but
>> not the other way around.
>
> OK, so somehow I missed that earlier. However there's an interesting
> section in the PRM:
>
> https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol07-3d_media_gpgpu.pdf
>
> on PDF page 854, "Dismissed Legacy Behaviors" which has a list of
> suggested IEEE 754 deviations for DX9. One of them is indeed that 0 *
> x = 0, but another is that input NaNs be propagated with certain
> exceptions. Also they suggest that RCP(0)/RSQ(0) = fmax. Interesting.
>
> So at this point, the zero_wins thing is pretty much blown. i965
> appears to have an all-or-nothing approach, and additionally that
> approach doesn't match up exactly to what NVIDIA does (or at least I'm
> not aware of a clamp-everything mode).
>
> This will take some thought to figure out how something can be
> specified so that a single spec works for both i965 and nv/amd. OTOH
> we could have two different specs that just expose different things -
> e.g. i965 could expose a MESA_shader_float_alt_mode or whatever which
> is spec'd to do the things that the PRM says, and nv/amd have the
> MESA_shader_float_zero_wins ext which does what we were talking about
> earlier.
>
> I'm open to other suggestions too.

Maybe we can go back to the original idea and have the extension
require that no NaNs can be generated by GLSL mathematical operators
and builtin functions (if no operand is a NaN?) It's possible that's
not exactly it but in any case the idea is to just specify expected
results, without requiring a specific route to get there. The
extension could introduce undefined behavior where necessary e.g.
allowing (but not requiring) INF results to be always flushed to fmax
when enabled.

For Intel that would work trivially. For AMD it should be a matter of
using the special instructions where necessary and "be careful" in a
few places (in the same vein as the RSQ and POW opcodes of ARB
programs Marek mentioned). Not sure about nouveau, I guess it should
be similar to AMD in the end.

Would that be too messy? Am I completely missing the point?