[Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

Tue Feb 9 00:48:57 UTC 2016

FWIW, D3D requires round-toward-zero for conversion to FP16 colorbuffers.

radeonsi does the same: round-toward-zero for colorbuffers and
round-to-nearest-even for the GLSL packing functions. This looks like
an inconsistency that may cause issues though.

Marek

On Tue, Feb 9, 2016 at 1:12 AM, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 09.02.2016 um 00:53 schrieb Ian Romanick:
>> On 02/08/2016 03:37 PM, Roland Scheidegger wrote:
>>> Am 09.02.2016 um 00:02 schrieb Ian Romanick:
>>>> On 02/08/2016 12:38 PM, Marek Olšák wrote:
>>>>> On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellard <tom at stellard.net> wrote:
>>>>>> On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote:
>>>>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>>>>
>>>>>>> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
>>>>>>> but not SI & CI, which can't disable denorms for those instructions.
>>>>>>
>>>>>> Do you know why this fixes FP16 conversions?  What does the OpenGL
>>>>>> spec say about denormal handing?
>>>>>
>>>>> Yes, I know why. The patch explain everything as far as I can see
>>>>> though. What isn't clear?
>>>>>
>>>>> SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit
>>>>> and accept FP16 denormals.
>>>>> VI: Supports FP16. FP16 denormal support is now configurable and
>>>>> affects FP16 conversions as well.(shared setting with FP64).
>>>>>
>>>>> OpenGL doesn't require denormals. Piglit does. I think this is
>>>>> incorrect piglit behavior.
>>>>
>>>> I submitted a public spec bug for this issue:
>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.khronos.org_bugzilla_show-5Fbug.cgi-3Fid-3D1460&d=BQIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=wf_-p9zXClKi6rlzphb6XSztBDs8LgFs5sHmLe6XksM&s=LSXF0wJDqDbzYPJ2Vq96RZlxflw--IPmOYlRKgcPgXg&e=
>>>>
>>>> I'm investigating whether a similar bug is needed for the SPIR-V
>>>> specification.
>>>>
>>>> I think an argument can be made for either the flush-to-zero or
>>>> non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly)
>>>> packHalf2x16.  The only place in the GLSL 4.50.5 specification that
>>>> mentions subnormal values is section 4.7.1 (Range and Precision).
>>>>
>>>>     "The precision of stored single- and double-precision floating-point
>>>>     variables is defined by the IEEE 754 standard for 32-bit and 64-bit
>>>>     floating-point numbers....Any denormalized value input into a
>>>>     shader or potentially generated by any operation in a shader can be
>>>>     flushed to 0."
>>>>
>>>> Since there is no half-precision type in desktop GLSL, there is no
>>>> mention of 16-bit subnormal values.  As Roland mentioned before, all
>>>> 16-bit subnormal values values are 32-bit normal values.
>>>>
>>>> As I mentioned before, from the point of view of an application
>>>> developer, the flush-to-zero behavior for unpackHalf2x16 is both
>>>> surprising and awful. :)
>>>>
>>>> While I think an argument can be made for either behavior, I also think
>>>> the argument for the non-flush-to-zero behavior is slightly stronger.
>>>> The case for flush-to-zero based on the above spec quotation fails for
>>>> two reasons.  First, the "input into [the] shader" is not a subnormal
>>>> number.  It is an integer.  Second, the "[value] potentially generated
>>>> by [the] operation" is not subnormal in single-precision.
>>>
>>> I don't disagree with that, however OTOH you could make an argument that
>>> such a strong guarantee for packed half floats is inconsistent with
>>> what's required for them elsewhere in GL. In particular half float
>>> texture formats - these are still based on ARB_half_float_pixel. Which
>>> says denormals are optional, infs are optional, NaNs are optional -
>>> albeit that's not any different to ordinary floats...
>>
>> Thanks for mentioning this. :)  The same issue had occurred to me, and I
>> was trying to find some relevant text in the GL spec.  I hadn't thought
>> to look in the extension spec.
> GL core spec 4.5 actually mentions pretty much the same within the
> generic numeric bits, section 2.3.4.2 - except the extension bit has
> explicitly listed that exponent 0 and mantissa non-zero may be decoded
> to zero (and similar for infs, nans). But the core bits text still
> mentions just that "providing a denormalized number or negative zero to
> GL must yield predictable results" so flush to zero is apparently still
> allowed.
>
>
>>
>>> (And I still have the problem that d3d10 wants trunc behavior instead of
>>> round... fwiw the precedent there in GL is also for r11g11b10 format,
>>> which says round-to-nearest recommended but trunc allowed, and all too
>>> large finite numbers converted to max finite (which is inconsistent with
>>> nearest rounding). The spec is completely silent both within GLSL or GL
>>> how rounding should be done for fp32 to fp16, albeit I don't disagree
>>> round-to-nearest seems the most reasonable.)
>>
>> The GLSL spec isn't silent.  Section 4.7.1 explicitly says, "The
>> rounding mode cannot be set and is undefined."
> Yes, but at least to me it's not really obvious this applies to all
> operations - and at least the basic operations say "must be correctly
> rounded", what does this even mean if the rounding mode isn't defined in
> the first place? Would the rounding mode have to be consistent for all
> operations, so, always "trunc" for all operations would be legal, albeit
> looking "slightly suboptimal", whereas different rounding mode just for
> conversion would be illegal?
> In any case, I'd be happy with any clarifications in glsl no matter how
> they end up...
>
> Roland
>
>
>
>>
>>> Roland
>>>
>>>> We've already determined that NVIDIA closed-source drivers do not flush
>>>> to zero.  I'm curious to know what AMD's closed-source drivers do for
>>>> 16-bit subnormal values supplied to unpackHalf2x16.  If they do not
>>>> flush to zero, then you had better believe that applications depend on
>>>> that behavior... and that also means that it doesn't matter very much
>>>> what piglit does or the spec does (or does not) say.  This is the sort
>>>> of situation where the spec changes to match application expectations
>>>> and shipping implementations... and Mesa drivers change to follow.  This
>>>> isn't even close to the first time through that loop.
>>
>