[Mesa-dev] [RFC PATCH] Add GL_MESA_ieee_fp_alu_mode specification draft

Mon Feb 24 19:21:20 UTC 2020

On Mon, Feb 24, 2020 at 1:10 PM Ian Romanick <idr at freedesktop.org> wrote:
>
> On 2/23/20 5:57 PM, Ilia Mirkin wrote:
> > ---
> >
> > We talked about something like this a while back, but the end result
> > was inconclusive. I added a TGSI MUL_ZERO_WINS shader property for nine.
> > But it'd be nice for wine to be able to control this too.
> >
> > I couldn't actually find any evidence of the discussion from 2017 or so,
> > so ... let's have another one.
> >
> >  docs/specs/MESA_ieee_fp_alu_mode.spec | 136 ++++++++++++++++++++++++++
> >  1 file changed, 136 insertions(+)
> >  create mode 100644 docs/specs/MESA_ieee_fp_alu_mode.spec
> >
> > diff --git a/docs/specs/MESA_ieee_fp_alu_mode.spec b/docs/specs/MESA_ieee_fp_alu_mode.spec
> > new file mode 100644
> > index 00000000000..cb274f06571
> > --- /dev/null
> > +++ b/docs/specs/MESA_ieee_fp_alu_mode.spec
> > @@ -0,0 +1,136 @@
> > +Name
> > +
> > +    MESA_ieee_fp_alu_mode
> > +
> > +Name Strings
> > +
> > +    GL_MESA_ieee_fp_alu_mode
> > +
> > +Contact
> > +
> > +    Ilia Mirkin, ilia 'at' x.org
> > +
> > +IP Status
> > +
> > +    No known IP issues.
> > +
> > +Status
> > +
> > +    Proposed
> > +
> > +Version
> > +
> > +Number
> > +
> > +    TBD
> > +
> > +Dependencies
> > +
> > +    OpenGL 3.0 or OpenGL ES 3.0 is required.
> > +
> > +    The extension is written against the OpenGL GL 3.0 and OpenGL ES 3.0
> > +    specifications.
> > +
> > +Overview
> > +
> > +    Pre-GL3 hardware did not generally have full IEEE floating point operation
> > +    support. Among other things, 0 * Infinity would work out to 0, and NaN's
> > +    might not be generated, or otherwise be treated improperly. GL3-class and
> > +    later hardware introduced full IEEE FP support, including NaN, Infinity,
> > +    and the proper generation of these.
> > +
> > +    Some software targeted at older hardware makes assumptions about how the
> > +    shader ALU works. And to accomodate these, GL3-class hardware has a way to
> > +    change how the shader ALU behaves. There are no standards around this, and
> > +    different hardware has different ways of dealing with it. However these
> > +    modes were designed specifically with such older software in mind.
> > +
> > +    This extension introduces a way to configure a context to be in non-IEEE
> > +    ALU mode. This extension does not specify precisely what this means, as
> > +    each vendor has something different. Generally it means non-IEEE compliant
> > +    handling of multiplication, as well as any other unspecified changes.
>
> I think many of the other things are specified.  They're the non-IEEE
> behaviors of GL_ARB_vertex_program and GL_ARB_fragment_program, and
> those mimic the required behavior of early DX shader models.  There are
> a bunch of cases that specify that zero is generated when IEEE would
> require NaN.
>
> If there's just a small handful of things like this, we'd probably be
> better adding a couple new built-in functions to do the job.  The
> problem on Intel hardware is... we really, really don't want to switch
> to non-IEEE mode because it changes how a bunch of things work, and we
> haven't tested any of that in many years.  I'd much rather put in some
> kind of work-arounds for things that don't want multiplication or pow()
> to generate NaN.

So basically anything that ever involves multiplication needs to have
these variants. Things like dot, the various crazy ops of days past
whose names escape me but involve complex calculations, etc. Things
like pow are questionable (depends on if they get decomposed or not),
and things like rcp/rsq unquestionably produce NaN's (or Infinity,
sorry not 100% sure but easily checked) on NVIDIA irrespective of that
mode being enabled.

Also on Intel hardware, as you mention, the "non-ieee" mode is ...
interesting, so to allow for that, I didn't want to say anything other
than the positive cases. If you have no interest in exposing this, I
could rewrite this in a NVIDIA/AMD-friendly manner.

>
> As for the mechanism, I'm very strongly in favor of something that would
> be locked-in when the shader is compiled.  I really want to avoid any
> potential that an external glEnable could trigger a a recompile.

Stefan Dösinger suggested a context flag on IRC. I'd be fine with that
too, even if I have to go create 2 exts due to GLX/EGL.

>
> The more I think about it... having an extension that adds a handful
> built-in functions that give old shader model behavior would be a good
> idea.  We could even test it. :)  I've looked a lot of shaders, and I've
> seen a lot of not-quite-what-they-wanted methods for avoiding NaN
> behavior in a bunch of these functions.  Having a special version of
> inversesqrt() that returns FLT_MAX for 0 would be useful to a lot of
> users.  As part of the spec we could even provide canonical versions of
> the functions so that users could copy-and-paste

That would preclude nv50-series hardware from benefiting, since it's
enabled/disabled context-wide there (see
https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/compute/cl50c0.h#L563
for compute, but same deal for graphics). If the plan for Intel is to
not implement this with a hardware flag change / separate instruction
encodings, probably best to leave the whole ext unimplemented. The
idea is to expose the hardware features that were used for DX9
compatibility, not helper functions which software like WINE could
very well implement on their own already (and apparently does in some
wine-staging patches).

>
> #ifndef GL_MESA_foo
>
> float inveresqrt_nonIEEE(float x)
> {
>     ...
> }
>
> #endif
>
> > +
> > +New Tokens
> > +
> > +    Accepted by the <cap> parameter of Enable, Disable, and IsEnabled, by
> > +    the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv, and
> > +    GetDoublev:
> > +
> > +        IEEE_FP_ALU_MODE_MESA                              0x????
> > +
> > +
> > +Changes to GLSL Section 4.1.4 Floats:
> > +
> > +    Add the following paragraph:
> > +
> > +    In case that the shader is being executed in a context with
> > +    IEEE_FP_ALU_MODE_MESA disabled, multiplication shall produce the following
> > +    (non-IEEE-complaint) result:
> > +
> > +       float a = 0;
> > +       float b = Infinity;
> > +       float c = a * b; // c == 0
> > +
> > +    There may be other implications from this mode being enabled, including
> > +    clamping of non-finite values, or anything else the hardware mode happens
> > +    to enable to achieve compatibility.
> > +
> > +New State
> > +
> > +    (add to table 6.52, Miscellaneous, p.392)
> > +
> > +                                               Initial
> > +    Get Value              Type   Get Command   Value     Description       Sec.   Attribute
> > +    ---------------------  -----  -----------  ------- ------------------  ------  ---------
> > +    IEEE_FP_ALU_MODE_MESA    B     IsEnabled    TRUE   Whether shader ALU           enable
> > +                                                       is in IEEE FP mode
> > +
> > +
> > +Issues
> > +
> > +    (1) This specification does not precisely specify what non-IEEE FP mode is.
> > +
> > +        RESOLVED. Shipping hardware has different ways of dealing with it. For
> > +        example, Intel clamps all values. NVIDIA Tesla series has a
> > +        context-wide mode for controlling whether zero wins in multiplication
> > +        or follows IEEE rules. NVIDIA Fermi+ series as well as ATI/AMD Radeon
> > +        R600+ has separate opcodes which control this (but again, a different
> > +        set of operations are covered).
> > +
> > +        A single extension which is going to be easy to use for emulation
> > +        software is thus much harder to write if it's to precisely specify
> > +        this.
> > +
> > +        The applications that want these have already been written and tested
> > +        against these approaches, so we know they all work with whatever the
> > +        hardware has to offer.
> > +
> > +    (2) Why use an Enable instead of a shader layout token?
> > +
> > +        RESOLVED. Because some hardware implementations don't allow
> > +        controlling this on a per-stage level. While one could come up with
> > +        rules requiring linked program stages to have the same setting, this
> > +        is going to be extra validation for the implementations to
> > +        implement. Furthermore, one would want these rules to also apply to
> > +        fixed-function-generated shaders equally. Instead a simple mode should
> > +        be able to flip this on and off.
> > +
> > +    (3) What about FP denorms?
> > +
> > +        RESOLVED. The same hardware tends to also have a way to control
> > +        whether denorm FP values are flushed to zero. GLSL does not specify
> > +        this explicitly, but some software relies on denorms being
> > +        flushed. Should there be a desire to allow denorms to work, this can
> > +        be done by another extension.
> > +
> > +    (4) What is the expected usage for this?
> > +
> > +        RESOLVED. Software which enables older games to operate,
> > +        e.g. emulators, will now be able to do shader translation without
> > +        copious checks for these "error" conditions.
> > +
> > +
> > +Revision History
> > +
> > +    Revision 1, ilia, 2020-02-23
> > +      - Initial draft
> >
>