<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 13, 2016 at 2:01 AM, Ian Romanick <span dir="ltr"><<a href="mailto:idr@freedesktop.org" target="_blank">idr@freedesktop.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On 01/12/2016 05:41 PM, Matt Turner wrote:<br>
> On Tue, Jan 12, 2016 at 4:10 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>> wrote:<br>
>> On Tue, Jan 12, 2016 at 3:52 PM, Matt Turner <<a href="mailto:mattst88@gmail.com" target="_blank">mattst88@gmail.com</a>> wrote:<br>
>>><br>
>>> On Tue, Jan 12, 2016 at 3:35 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>><br>
>>> wrote:<br>
>>>> This opcode simply takes a 32-bit floating-point value and reduces its<br>
>>>> effective precision to 16 bits.<br>
>>>> ---<br>
>>><br>
>>> What's it supposed to do for values not representable in half-precision?<br>
>><br>
>><br>
>> If they're in-range, round. If they're out-of-range, the appropriate<br>
>> infinity.<br>
><br>
> Are you sure that's the behavior hardware has? And by "are you sure" I<br>
> mean "have you tested it"<br>
><br>
> The conversion table in the f32to16 documentation in the IVB PRM says:<br>
><br>
> single precision -> half precision<br>
> ------------------------------------<br>
> -finite -> -finite/-denorm/-0<br>
> +finite -> +finite/+denorm/+0<br>
><br>
>> <a href="https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.html#OpQuantizeToF16" rel="noreferrer" target="_blank">https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.html#OpQuantizeToF16</a><br>
><br>
>> Quantize a floating-point value to a what is expressible by a 16-bit floating-point value.<br>
><br>
> Erf, anyway,<br>
><br>
> ... and the "convert too-large values to inf" isn't the behavior of<br>
> other languages like C [1] (and I don't think GLSL either, but I can't<br>
> find anything on the matter i the spec) or OpenCL C [2].<br>
<br>
</span>Some background may either clarify or further muddy things.<br>
<br>
Right now applications sprinkle mediump and lowp all over the place in<br>
GLSL ES shaders. Many vertex shader implementations, even on mobile<br>
devices, do everything in single precision. Many devices will only use<br>
f16 part of the time because some instructions may not have f16<br>
versions. When we finally implement f16 in the i965 driver, we'll be in<br>
this boat too.<br>
<br>
As a result, people think that their mediump-decorated code is fine...<br>
until it actually runs on a device that really does mediump. Then they<br>
report a bug to the vendor of that hardware. Sound like a familiar<br>
situation?<br>
<br>
>From this problem the OpQuantizeToF16 SPRI-V instruction was born. The<br>
intention is that people could compile their code in a way that mediump<br>
gives you mediump precision on every device. While you probably<br>
wouldn't want to ship such code, this at least makes it possible to test<br>
it without having to find a device that will really do native mediump<br>
calculations all the time.<br>
<br>
IIRC, GLSL doesn't require Inf in mediump. I don't recall what SPRI-V<br>
says. I believe that GLSL allows saturating to the maximum magnitude<br>
representable value. What we want is for an expression tree like<br>
<br>
OpQuantizeToF16(OpQuantizeToF16(x) + OpQuantizeToF16(y))<br>
<br>
to produce the same value that 'x + y' would produce in "real" f16 mediump.<br></blockquote><div><br></div><div>Right. This is exactly why the opcode was created.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
The SPRI-V +/-Inf requirement doesn't completely jive with my<br>
recollection of the discussions... but there was a lot of<br>
back-and-forth, and it was quite a few months ago at this point. I<br>
think we may have picked just one possible answer instead of allowing<br>
both choices just for consistency. I don't have any memory whether<br>
anyone strongly wanted the +/-Inf behavior or if it was just a coin toss.<br></blockquote><div><br></div><div>For OpQuantizeF16, the spec does currently <br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<span><br>
> Section 8.3.2 of the OpenCL C 2.0 spec is also relevant, but doesn't<br>
> touch directly on the issue at hand.<br>
><br>
> I'm worried that what is specified is not implementable via a round<br>
> trip through half-precision, because it's not the behavior other<br>
> languages implement.<br>
><br>
> If I had to guess, given the table in the IVB PRM and section 8.3.2,<br>
> out-of-range single-precision floats are converted to the<br>
> half-precision value with the largest magnitude.<br>
<br>
</span>You are correct, we should test it to be sure what the hardware really<br>
does. This is not intended to be a performance operation. If we need to<br>
use a different, more expensive expansion to meet the requirements, we<br>
shouldn't lose any sleep over it.<br></blockquote><div><br></div><div>I haven't looked at it in bit-for-bit detail, but I I did run it through a set of tests which explicitly hits denorms and the out-of-bounds cases in both directions. The tests seem to indicate that the hardware does what the opcode claims.<br><br></div><div>--Jason <br></div></div><br></div></div>