<div dir="auto"><div><div class="gmail_extra"><div class="gmail_quote">On Jan 12, 2017 4:56 PM, "Ilia Mirkin" <<a href="mailto:imirkin@alum.mit.edu">imirkin@alum.mit.edu</a>> wrote:<br type="attribution"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="elided-text">On Thu, Jan 12, 2017 at 7:46 PM, Matt Turner <<a href="mailto:mattst88@gmail.com">mattst88@gmail.com</a>> wrote:<br> > On Thu, Jan 12, 2017 at 3:20 PM, Ilia Mirkin <<a href="mailto:imirkin@alum.mit.edu">imirkin@alum.mit.edu</a>> wrote:<br> >> On Thu, Jan 12, 2017 at 6:04 PM, Nicolai Hähnle <<a href="mailto:nhaehnle@gmail.com">nhaehnle@gmail.com</a>> wrote:<br> >>> On 12.01.2017 23:46, Ilia Mirkin wrote:<br> >>>><br> >>>> On Thu, Jan 12, 2017 at 4:03 PM, Matteo Bruni <<a href="mailto:matteo.mystral@gmail.com">matteo.mystral@gmail.com</a>><br> >>>> wrote:<br> >>>>><br> >>>>> So, what would be really nice to have is a GLSL extension for some<br> >>>>> kind of switch to select the requested behavior WRT NaN. For example a<br> >>>>> three-way option with "don't generate NaN in arithmetic operations",<br> >>>>> "do generate NaN" and "don't care". It could also be a GL state if<br> >>>>> that's easier to implement with the existing hardware, since an<br> >>>>> individual application isn't supposed to require different behavior<br> >>>>> from one shader to the next.<br> >>>>><br> >>>>> Is anyone interested in / favorable to something like this? It would<br> >>>>> solve the issue with defining NaN behavior in GLSL while making things<br> >>>>> a bit more compatible with "other API a lot of games are ported from<br> >>>>> which happens to be supported by all the desktop GPUs".<br> >>>><br> >>>><br> >>>> Not that I'm biased, but on the NVIDIA Tesla series (G80-GT21x), this<br> >>>> enable is handled via a global flag, not in the shader binary, so this<br> >>>> is all-or-nothing for a whole pipeline. On GF100+, I believe there is<br> >>>> also an enable via a global flag, but there are also a FMUL.FMZ (and<br> >>>> FFMA.FMZ) flag, which I *think* has the same effect. So for GF100+ hw,<br> >>>> this could be done at the instruction level.<br> >>><br> >>><br> >>> Well, I would also have advocated for what is effectively a<br> >>> per-program/pipeline flag anyway, even though GCN hardware can theoretically<br> >>> do it per-instruction. Tracking a per-instruction bit in the compiler<br> >>> quickly becomes fragile (e.g. there's no good way for us to model this<br> >>> information per-instruction in LLVM IR). Per-shader isn't any better than<br> >>> per-instruction due to linking, and per-shader-stage is awkward if we ever<br> >>> want to do fancier cross-stage optimizations.<br> >>><br> >>> It's really quite simple. Introduce an extension with a name like<br> >>> MESA_shader_float_dx9. The behavior I'd suggest is:<br> >>><br> >>> Enabling/requiring the extension in a shader causes various semantics<br> >>> changes to bring floating point behavior in line with DX9 in that shader's<br> >>> code:<br> >>><br> >>> - 0*x = 0<br> >><br> >> Yes. But only for fp32, not for fp64.<br> >><br> >>> - sqrt/rsqrt are guaranteed to take the absolute value of their argument<br> >><br> >> Is that necessary? If the software knows about the ext, it also knows<br> >> to stick the abs() in.<br> ><br> > Is there a compelling reason to make the extension offer just one of<br> > these many behavior differences?<br> ><br> > FWIW, i965 has IEEE and "ALT" floating-point modes. ALT, I think<br> > corresponds to d3d9 behavior, and its description says<br> ><br> > A floating-point execution mode that maps +/- inf to +/- fmax, +/-<br> > denorm to +/-0, and NaN to +0 at the FPU inputs and never produces<br> > infinities, denormals, or NaN values as outputs.<br> <br> </div>Interesting. I believe on NVIDIA hardware, it's just float multiply<br> that's affected.<br> <div class="quoted-text"><br> ><br> > Also: Extended mathematics functions of log(), rsq() and sqrt() take<br> > the absolute value of the sources before computation to avoid<br> > generating INF and NaN results.<br> ><br> > If those two behaviors correspond to d3d9 behavior, I wouldn't want an<br> > extension that offered only the "zero wins" behavior and expected<br> > applications to insert abs().<br> <br> </div>Really? That creates ARB_gpu_shader5-style extensions which do 75<br> different things and that you can't expose if you can only do 74 of<br> them.</blockquote></div></div></div><div dir="auto"><br></div><div dir="auto">I understand your concern but what hardware are we planning to expose this on that can't do d3d9? It seems like there are three things here: inf/NaN handling, denorm flushing, and abs() on special functions. I suppose I'd be OK with separating things out a bit but we need to have a single enable or else our hatdware is going to have serious problems with it.</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I think in the past we've avoided things like having "d3d9 mode"<br> in gallium API's - it's nice for these things to be individually<br> enumerated. I like the direction that e.g. ARB_clip_control went in -<br> make it all configurable individually instead of bundling unrelated<br> things together. This has allowed e.g. dolphin to do things in OpenGL<br> that are impossible on DX. And whether 0 * x = 0 or not seems rather<br> unrelated from whether rsq takes abs of its args.</blockquote></div></div></div><div dir="auto"><div class="gmail_extra"><br></div><div class="gmail_extra" dir="auto">Unless, of course, it's controlled by the same hardware bit... Clearly, we can can give you abs on rsq without denorm flushing <span style="font-family:sans-serif">(easy shader hacks) but </span>not the other way around.</div></div></div>