[Mesa-dev] [PATCH 1/2] nir: Add a fquantize2f16 opcode

Jason Ekstrand jason at jlekstrand.net
Wed Jan 13 13:46:10 PST 2016


On Wed, Jan 13, 2016 at 2:01 AM, Ian Romanick <idr at freedesktop.org> wrote:

> On 01/12/2016 05:41 PM, Matt Turner wrote:
> > On Tue, Jan 12, 2016 at 4:10 PM, Jason Ekstrand <jason at jlekstrand.net>
> wrote:
> >> On Tue, Jan 12, 2016 at 3:52 PM, Matt Turner <mattst88 at gmail.com>
> wrote:
> >>>
> >>> On Tue, Jan 12, 2016 at 3:35 PM, Jason Ekstrand <jason at jlekstrand.net>
> >>> wrote:
> >>>> This opcode simply takes a 32-bit floating-point value and reduces its
> >>>> effective precision to 16 bits.
> >>>> ---
> >>>
> >>> What's it supposed to do for values not representable in
> half-precision?
> >>
> >>
> >> If they're in-range, round.  If they're out-of-range, the appropriate
> >> infinity.
> >
> > Are you sure that's the behavior hardware has? And by "are you sure" I
> > mean "have you tested it"
> >
> > The conversion table in the f32to16 documentation in the IVB PRM says:
> >
> > single precision -> half precision
> > ------------------------------------
> > -finite -> -finite/-denorm/-0
> > +finite -> +finite/+denorm/+0
> >
> >>
> https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.html#OpQuantizeToF16
> >
> >> Quantize a floating-point value to a what is expressible by a 16-bit
> floating-point value.
> >
> > Erf, anyway,
> >
> > ... and the "convert too-large values to inf" isn't the behavior of
> > other languages like C [1] (and I don't think GLSL either, but I can't
> > find anything on the matter i the spec) or OpenCL C [2].
>
> Some background may either clarify or further muddy things.
>
> Right now applications sprinkle mediump and lowp all over the place in
> GLSL ES shaders.  Many vertex shader implementations, even on mobile
> devices, do everything in single precision.  Many devices will only use
> f16 part of the time because some instructions may not have f16
> versions.  When we finally implement f16 in the i965 driver, we'll be in
> this boat too.
>
> As a result, people think that their mediump-decorated code is fine...
> until it actually runs on a device that really does mediump.  Then they
> report a bug to the vendor of that hardware.  Sound like a familiar
> situation?
>
> From this problem the OpQuantizeToF16 SPRI-V instruction was born.  The
> intention is that people could compile their code in a way that mediump
> gives you mediump precision on every device.  While you probably
> wouldn't want to ship such code, this at least makes it possible to test
> it without having to find a device that will really do native mediump
> calculations all the time.
>
> IIRC, GLSL doesn't require Inf in mediump.  I don't recall what SPRI-V
> says.  I believe that GLSL allows saturating to the maximum magnitude
> representable value.  What we want is for an expression tree like
>
>     OpQuantizeToF16(OpQuantizeToF16(x) + OpQuantizeToF16(y))
>
> to produce the same value that 'x + y' would produce in "real" f16 mediump.
>

Right.  This is exactly why the opcode was created.


>
> The SPRI-V +/-Inf requirement doesn't completely jive with my
> recollection of the discussions... but there was a lot of
> back-and-forth, and it was quite a few months ago at this point.  I
> think we may have picked just one possible answer instead of allowing
> both choices just for consistency.  I don't have any memory whether
> anyone strongly wanted the +/-Inf behavior or if it was just a coin toss.
>

For OpQuantizeF16, the spec does currently


>
> > Section 8.3.2 of the OpenCL C 2.0 spec is also relevant, but doesn't
> > touch directly on the issue at hand.
> >
> > I'm worried that what is specified is not implementable via a round
> > trip through half-precision, because it's not the behavior other
> > languages implement.
> >
> > If I had to guess, given the table in the IVB PRM and section 8.3.2,
> > out-of-range single-precision floats are converted to the
> > half-precision value with the largest magnitude.
>
> You are correct, we should test it to be sure what the hardware really
> does. This is not intended to be a performance operation. If we need to
> use a different, more expensive expansion to meet the requirements, we
> shouldn't lose any sleep over it.
>

I haven't looked at it in bit-for-bit detail, but I I did run it through a
set of tests which explicitly hits denorms and the out-of-bounds cases in
both directions.  The tests seem to indicate that the hardware does what
the opcode claims.

--Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20160113/ef99592b/attachment-0001.html>


More information about the mesa-dev mailing list