<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Jan 13, 2017 at 8:43 AM, Marek Olšák <span dir="ltr"><<a href="mailto:maraeo@gmail.com" target="_blank">maraeo@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Fri, Jan 13, 2017 at 5:25 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> wrote:<br>
> On Fri, Jan 13, 2017 at 4:05 AM, Marek Olšák <<a href="mailto:maraeo@gmail.com">maraeo@gmail.com</a>> wrote:<br>
>><br>
>> On Fri, Jan 13, 2017 at 3:37 AM, Ilia Mirkin <<a href="mailto:imirkin@alum.mit.edu">imirkin@alum.mit.edu</a>> wrote:<br>
>> > On Thu, Jan 12, 2017 at 9:13 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br>
>> > wrote:<br>
>> >> Unless, of course, it's controlled by the same hardware bit... Clearly,<br>
>> >> we<br>
>> >> can can give you abs on rsq without denorm flushing (easy shader hacks)<br>
>> >> but<br>
>> >> not the other way around.<br>
>> ><br>
>> > OK, so somehow I missed that earlier. However there's an interesting<br>
>> > section in the PRM:<br>
>> ><br>
>> ><br>
>> > <a href="https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol07-3d_media_gpgpu.pdf" rel="noreferrer" target="_blank">https://01.org/sites/default/<wbr>files/documentation/intel-gfx-<wbr>prm-osrc-skl-vol07-3d_media_<wbr>gpgpu.pdf</a><br>
>> ><br>
>> > on PDF page 854, "Dismissed Legacy Behaviors" which has a list of<br>
>> > suggested IEEE 754 deviations for DX9. One of them is indeed that 0 *<br>
>> > x = 0, but another is that input NaNs be propagated with certain<br>
>> > exceptions. Also they suggest that RCP(0)/RSQ(0) = fmax. Interesting.<br>
>> ><br>
>> > So at this point, the zero_wins thing is pretty much blown. i965<br>
>> > appears to have an all-or-nothing approach, and additionally that<br>
>> > approach doesn't match up exactly to what NVIDIA does (or at least I'm<br>
>> > not aware of a clamp-everything mode).<br>
>> ><br>
>> > This will take some thought to figure out how something can be<br>
>> > specified so that a single spec works for both i965 and nv/amd. OTOH<br>
>> > we could have two different specs that just expose different things -<br>
>> > e.g. i965 could expose a MESA_shader_float_alt_mode or whatever which<br>
>> > is spec'd to do the things that the PRM says, and nv/amd have the<br>
>> > MESA_shader_float_zero_wins ext which does what we were talking about<br>
>> > earlier.<br>
>> ><br>
>> > I'm open to other suggestions too.<br>
>><br>
>> There is also the "small" problem that it would take a non-trivial<br>
>> effort for us on the LLVM side. You guys can flip a switch. We can't.<br>
><br>
><br>
> Don't you have to expend that effort for ARB programs anyway? I thought<br>
> they weren't supposed to generate NaN either.<br>
<br>
</div></div>No, we don't, because st/mesa adds abs before RSQ and the driver<br>
implements POW as log+mul+exp, where mul follows the rule<br>
0*anything=0. I don't think any other opcode follows that rule though.<span class="HOEnZb"><font color="#888888"><br></font></span></blockquote><div><br></div><div>Ah. That makes sense. Do you also implement DIV as MUL+RCP? If so, the two of those should take care of NaN getting generated in the shader. We'd still have to do something about inf and maybe denorms. <br></div></div><br></div></div>