[Mesa-dev] [RFC PATCH] Add GL_MESA_ieee_fp_alu_mode specification draft

Mon Feb 24 18:10:30 UTC 2020

On 2/23/20 5:57 PM, Ilia Mirkin wrote:
> ---
> 
> We talked about something like this a while back, but the end result
> was inconclusive. I added a TGSI MUL_ZERO_WINS shader property for nine.
> But it'd be nice for wine to be able to control this too.
> 
> I couldn't actually find any evidence of the discussion from 2017 or so,
> so ... let's have another one.
> 
>  docs/specs/MESA_ieee_fp_alu_mode.spec | 136 ++++++++++++++++++++++++++
>  1 file changed, 136 insertions(+)
>  create mode 100644 docs/specs/MESA_ieee_fp_alu_mode.spec
> 
> diff --git a/docs/specs/MESA_ieee_fp_alu_mode.spec b/docs/specs/MESA_ieee_fp_alu_mode.spec
> new file mode 100644
> index 00000000000..cb274f06571
> --- /dev/null
> +++ b/docs/specs/MESA_ieee_fp_alu_mode.spec
> @@ -0,0 +1,136 @@
> +Name
> +
> +    MESA_ieee_fp_alu_mode
> +
> +Name Strings
> +
> +    GL_MESA_ieee_fp_alu_mode
> +
> +Contact
> +
> +    Ilia Mirkin, ilia 'at' x.org
> +
> +IP Status
> +
> +    No known IP issues.
> +
> +Status
> +
> +    Proposed
> +
> +Version
> +
> +Number
> +
> +    TBD
> +
> +Dependencies
> +
> +    OpenGL 3.0 or OpenGL ES 3.0 is required.
> +
> +    The extension is written against the OpenGL GL 3.0 and OpenGL ES 3.0
> +    specifications.
> +
> +Overview
> +
> +    Pre-GL3 hardware did not generally have full IEEE floating point operation
> +    support. Among other things, 0 * Infinity would work out to 0, and NaN's
> +    might not be generated, or otherwise be treated improperly. GL3-class and
> +    later hardware introduced full IEEE FP support, including NaN, Infinity,
> +    and the proper generation of these.
> +
> +    Some software targeted at older hardware makes assumptions about how the
> +    shader ALU works. And to accomodate these, GL3-class hardware has a way to
> +    change how the shader ALU behaves. There are no standards around this, and
> +    different hardware has different ways of dealing with it. However these
> +    modes were designed specifically with such older software in mind.
> +
> +    This extension introduces a way to configure a context to be in non-IEEE
> +    ALU mode. This extension does not specify precisely what this means, as
> +    each vendor has something different. Generally it means non-IEEE compliant
> +    handling of multiplication, as well as any other unspecified changes.

I think many of the other things are specified.  They're the non-IEEE
behaviors of GL_ARB_vertex_program and GL_ARB_fragment_program, and
those mimic the required behavior of early DX shader models.  There are
a bunch of cases that specify that zero is generated when IEEE would
require NaN.

If there's just a small handful of things like this, we'd probably be
better adding a couple new built-in functions to do the job.  The
problem on Intel hardware is... we really, really don't want to switch
to non-IEEE mode because it changes how a bunch of things work, and we
haven't tested any of that in many years.  I'd much rather put in some
kind of work-arounds for things that don't want multiplication or pow()
to generate NaN.

As for the mechanism, I'm very strongly in favor of something that would
be locked-in when the shader is compiled.  I really want to avoid any
potential that an external glEnable could trigger a a recompile.

The more I think about it... having an extension that adds a handful
built-in functions that give old shader model behavior would be a good
idea.  We could even test it. :)  I've looked a lot of shaders, and I've
seen a lot of not-quite-what-they-wanted methods for avoiding NaN
behavior in a bunch of these functions.  Having a special version of
inversesqrt() that returns FLT_MAX for 0 would be useful to a lot of
users.  As part of the spec we could even provide canonical versions of
the functions so that users could copy-and-paste

#ifndef GL_MESA_foo

float inveresqrt_nonIEEE(float x)
{
    ...
}

#endif

> +
> +New Tokens
> +
> +    Accepted by the <cap> parameter of Enable, Disable, and IsEnabled, by
> +    the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv, and
> +    GetDoublev:
> +
> +        IEEE_FP_ALU_MODE_MESA                              0x????
> +
> +
> +Changes to GLSL Section 4.1.4 Floats:
> +
> +    Add the following paragraph:
> +
> +    In case that the shader is being executed in a context with
> +    IEEE_FP_ALU_MODE_MESA disabled, multiplication shall produce the following
> +    (non-IEEE-complaint) result:
> +
> +       float a = 0;
> +       float b = Infinity;
> +       float c = a * b; // c == 0
> +
> +    There may be other implications from this mode being enabled, including
> +    clamping of non-finite values, or anything else the hardware mode happens
> +    to enable to achieve compatibility.
> +
> +New State
> +
> +    (add to table 6.52, Miscellaneous, p.392)
> +
> +                                               Initial
> +    Get Value              Type   Get Command   Value     Description       Sec.   Attribute
> +    ---------------------  -----  -----------  ------- ------------------  ------  ---------
> +    IEEE_FP_ALU_MODE_MESA    B     IsEnabled    TRUE   Whether shader ALU           enable
> +                                                       is in IEEE FP mode
> +
> +
> +Issues
> +
> +    (1) This specification does not precisely specify what non-IEEE FP mode is.
> +
> +        RESOLVED. Shipping hardware has different ways of dealing with it. For
> +        example, Intel clamps all values. NVIDIA Tesla series has a
> +        context-wide mode for controlling whether zero wins in multiplication
> +        or follows IEEE rules. NVIDIA Fermi+ series as well as ATI/AMD Radeon
> +        R600+ has separate opcodes which control this (but again, a different
> +        set of operations are covered).
> +
> +        A single extension which is going to be easy to use for emulation
> +        software is thus much harder to write if it's to precisely specify
> +        this.
> +
> +        The applications that want these have already been written and tested
> +        against these approaches, so we know they all work with whatever the
> +        hardware has to offer.
> +
> +    (2) Why use an Enable instead of a shader layout token?
> +
> +        RESOLVED. Because some hardware implementations don't allow
> +        controlling this on a per-stage level. While one could come up with
> +        rules requiring linked program stages to have the same setting, this
> +        is going to be extra validation for the implementations to
> +        implement. Furthermore, one would want these rules to also apply to
> +        fixed-function-generated shaders equally. Instead a simple mode should
> +        be able to flip this on and off.
> +
> +    (3) What about FP denorms?
> +
> +        RESOLVED. The same hardware tends to also have a way to control
> +        whether denorm FP values are flushed to zero. GLSL does not specify
> +        this explicitly, but some software relies on denorms being
> +        flushed. Should there be a desire to allow denorms to work, this can
> +        be done by another extension.
> +
> +    (4) What is the expected usage for this?
> +
> +        RESOLVED. Software which enables older games to operate,
> +        e.g. emulators, will now be able to do shader translation without
> +        copious checks for these "error" conditions.
> +
> +
> +Revision History
> +
> +    Revision 1, ilia, 2020-02-23
> +      - Initial draft
>