<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 13, 2016 at 2:01 AM, Ian Romanick <span dir="ltr"><<a href="mailto:idr@freedesktop.org" target="_blank">idr@freedesktop.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On 01/12/2016 05:41 PM, Matt Turner wrote:<br> > On Tue, Jan 12, 2016 at 4:10 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>> wrote:<br> >> On Tue, Jan 12, 2016 at 3:52 PM, Matt Turner <<a href="mailto:mattst88@gmail.com" target="_blank">mattst88@gmail.com</a>> wrote:<br> >>><br> >>> On Tue, Jan 12, 2016 at 3:35 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>><br> >>> wrote:<br> >>>> This opcode simply takes a 32-bit floating-point value and reduces its<br> >>>> effective precision to 16 bits.<br> >>>> ---<br> >>><br> >>> What's it supposed to do for values not representable in half-precision?<br> >><br> >><br> >> If they're in-range, round. If they're out-of-range, the appropriate<br> >> infinity.<br> ><br> > Are you sure that's the behavior hardware has? And by "are you sure" I<br> > mean "have you tested it"<br> ><br> > The conversion table in the f32to16 documentation in the IVB PRM says:<br> ><br> > single precision -> half precision<br> > ------------------------------------<br> > -finite -> -finite/-denorm/-0<br> > +finite -> +finite/+denorm/+0<br> ><br> >> <a href="https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.html#OpQuantizeToF16" rel="noreferrer" target="_blank">https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.html#OpQuantizeToF16</a><br> ><br> >> Quantize a floating-point value to a what is expressible by a 16-bit floating-point value.<br> ><br> > Erf, anyway,<br> ><br> > ... and the "convert too-large values to inf" isn't the behavior of<br> > other languages like C [1] (and I don't think GLSL either, but I can't<br> > find anything on the matter i the spec) or OpenCL C [2].<br> <br> </span>Some background may either clarify or further muddy things.<br> <br> Right now applications sprinkle mediump and lowp all over the place in<br> GLSL ES shaders. Many vertex shader implementations, even on mobile<br> devices, do everything in single precision. Many devices will only use<br> f16 part of the time because some instructions may not have f16<br> versions. When we finally implement f16 in the i965 driver, we'll be in<br> this boat too.<br> <br> As a result, people think that their mediump-decorated code is fine...<br> until it actually runs on a device that really does mediump. Then they<br> report a bug to the vendor of that hardware. Sound like a familiar<br> situation?<br> <br> >From this problem the OpQuantizeToF16 SPRI-V instruction was born. The<br> intention is that people could compile their code in a way that mediump<br> gives you mediump precision on every device. While you probably<br> wouldn't want to ship such code, this at least makes it possible to test<br> it without having to find a device that will really do native mediump<br> calculations all the time.<br> <br> IIRC, GLSL doesn't require Inf in mediump. I don't recall what SPRI-V<br> says. I believe that GLSL allows saturating to the maximum magnitude<br> representable value. What we want is for an expression tree like<br> <br> OpQuantizeToF16(OpQuantizeToF16(x) + OpQuantizeToF16(y))<br> <br> to produce the same value that 'x + y' would produce in "real" f16 mediump.<br></blockquote><div><br></div><div>Right. This is exactly why the opcode was created.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br> The SPRI-V +/-Inf requirement doesn't completely jive with my<br> recollection of the discussions... but there was a lot of<br> back-and-forth, and it was quite a few months ago at this point. I<br> think we may have picked just one possible answer instead of allowing<br> both choices just for consistency. I don't have any memory whether<br> anyone strongly wanted the +/-Inf behavior or if it was just a coin toss.<br></blockquote><div><br></div><div>For OpQuantizeF16, the spec does currently <br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <span><br> > Section 8.3.2 of the OpenCL C 2.0 spec is also relevant, but doesn't<br> > touch directly on the issue at hand.<br> ><br> > I'm worried that what is specified is not implementable via a round<br> > trip through half-precision, because it's not the behavior other<br> > languages implement.<br> ><br> > If I had to guess, given the table in the IVB PRM and section 8.3.2,<br> > out-of-range single-precision floats are converted to the<br> > half-precision value with the largest magnitude.<br> <br> </span>You are correct, we should test it to be sure what the hardware really<br> does. This is not intended to be a performance operation. If we need to<br> use a different, more expensive expansion to meet the requirements, we<br> shouldn't lose any sleep over it.<br></blockquote><div><br></div><div>I haven't looked at it in bit-for-bit detail, but I I did run it through a set of tests which explicitly hits denorms and the out-of-bounds cases in both directions. The tests seem to indicate that the hardware does what the opcode claims.<br><br></div><div>--Jason <br></div></div><br></div></div>