[Mesa-dev] [PATCH 1/2] nir: Add a fquantize2f16 opcode

Tue Jan 12 17:41:43 PST 2016

On Tue, Jan 12, 2016 at 4:10 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Tue, Jan 12, 2016 at 3:52 PM, Matt Turner <mattst88 at gmail.com> wrote:
>>
>> On Tue, Jan 12, 2016 at 3:35 PM, Jason Ekstrand <jason at jlekstrand.net>
>> wrote:
>> > This opcode simply takes a 32-bit floating-point value and reduces its
>> > effective precision to 16 bits.
>> > ---
>>
>> What's it supposed to do for values not representable in half-precision?
>
>
> If they're in-range, round.  If they're out-of-range, the appropriate
> infinity.

Are you sure that's the behavior hardware has? And by "are you sure" I
mean "have you tested it"

The conversion table in the f32to16 documentation in the IVB PRM says:

single precision -> half precision
------------------------------------
-finite -> -finite/-denorm/-0
+finite -> +finite/+denorm/+0

> https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.html#OpQuantizeToF16

> Quantize a floating-point value to a what is expressible by a 16-bit floating-point value.

Erf, anyway,

... and the "convert too-large values to inf" isn't the behavior of
other languages like C [1] (and I don't think GLSL either, but I can't
find anything on the matter i the spec) or OpenCL C [2].

Section 8.3.2 of the OpenCL C 2.0 spec is also relevant, but doesn't
touch directly on the issue at hand.

I'm worried that what is specified is not implementable via a round
trip through half-precision, because it's not the behavior other
languages implement.

If I had to guess, given the table in the IVB PRM and section 8.3.2,
out-of-range single-precision floats are converted to the
half-precision value with the largest magnitude.

[1] C99 spec, 6.3.1.5 says "If the value being converted is outside
the range of values that can be represented, the behavior is
undefined."
[2] OpenCL C 2.0 spec 6.2.3.3 says to refer to C99 spec section 6.3.