[Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

Tue Feb 9 00:12:17 UTC 2016

Am 09.02.2016 um 00:53 schrieb Ian Romanick:
> On 02/08/2016 03:37 PM, Roland Scheidegger wrote:
>> Am 09.02.2016 um 00:02 schrieb Ian Romanick:
>>> On 02/08/2016 12:38 PM, Marek Olšák wrote:
>>>> On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellard <tom at stellard.net> wrote:
>>>>> On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote:
>>>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>>>
>>>>>> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
>>>>>> but not SI & CI, which can't disable denorms for those instructions.
>>>>>
>>>>> Do you know why this fixes FP16 conversions?  What does the OpenGL
>>>>> spec say about denormal handing?
>>>>
>>>> Yes, I know why. The patch explain everything as far as I can see
>>>> though. What isn't clear?
>>>>
>>>> SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit
>>>> and accept FP16 denormals.
>>>> VI: Supports FP16. FP16 denormal support is now configurable and
>>>> affects FP16 conversions as well.(shared setting with FP64).
>>>>
>>>> OpenGL doesn't require denormals. Piglit does. I think this is
>>>> incorrect piglit behavior.
>>>
>>> I submitted a public spec bug for this issue:
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.khronos.org_bugzilla_show-5Fbug.cgi-3Fid-3D1460&d=BQIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=wf_-p9zXClKi6rlzphb6XSztBDs8LgFs5sHmLe6XksM&s=LSXF0wJDqDbzYPJ2Vq96RZlxflw--IPmOYlRKgcPgXg&e= 
>>>
>>> I'm investigating whether a similar bug is needed for the SPIR-V
>>> specification.
>>>
>>> I think an argument can be made for either the flush-to-zero or
>>> non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly)
>>> packHalf2x16.  The only place in the GLSL 4.50.5 specification that
>>> mentions subnormal values is section 4.7.1 (Range and Precision).
>>>
>>>     "The precision of stored single- and double-precision floating-point
>>>     variables is defined by the IEEE 754 standard for 32-bit and 64-bit
>>>     floating-point numbers....Any denormalized value input into a
>>>     shader or potentially generated by any operation in a shader can be
>>>     flushed to 0."
>>>
>>> Since there is no half-precision type in desktop GLSL, there is no
>>> mention of 16-bit subnormal values.  As Roland mentioned before, all
>>> 16-bit subnormal values values are 32-bit normal values.
>>>
>>> As I mentioned before, from the point of view of an application
>>> developer, the flush-to-zero behavior for unpackHalf2x16 is both
>>> surprising and awful. :)
>>>
>>> While I think an argument can be made for either behavior, I also think
>>> the argument for the non-flush-to-zero behavior is slightly stronger.
>>> The case for flush-to-zero based on the above spec quotation fails for
>>> two reasons.  First, the "input into [the] shader" is not a subnormal
>>> number.  It is an integer.  Second, the "[value] potentially generated
>>> by [the] operation" is not subnormal in single-precision.
>>
>> I don't disagree with that, however OTOH you could make an argument that
>> such a strong guarantee for packed half floats is inconsistent with
>> what's required for them elsewhere in GL. In particular half float
>> texture formats - these are still based on ARB_half_float_pixel. Which
>> says denormals are optional, infs are optional, NaNs are optional -
>> albeit that's not any different to ordinary floats...
> 
> Thanks for mentioning this. :)  The same issue had occurred to me, and I
> was trying to find some relevant text in the GL spec.  I hadn't thought
> to look in the extension spec.
GL core spec 4.5 actually mentions pretty much the same within the
generic numeric bits, section 2.3.4.2 - except the extension bit has
explicitly listed that exponent 0 and mantissa non-zero may be decoded
to zero (and similar for infs, nans). But the core bits text still
mentions just that "providing a denormalized number or negative zero to
GL must yield predictable results" so flush to zero is apparently still
allowed.


> 
>> (And I still have the problem that d3d10 wants trunc behavior instead of
>> round... fwiw the precedent there in GL is also for r11g11b10 format,
>> which says round-to-nearest recommended but trunc allowed, and all too
>> large finite numbers converted to max finite (which is inconsistent with
>> nearest rounding). The spec is completely silent both within GLSL or GL
>> how rounding should be done for fp32 to fp16, albeit I don't disagree
>> round-to-nearest seems the most reasonable.)
> 
> The GLSL spec isn't silent.  Section 4.7.1 explicitly says, "The
> rounding mode cannot be set and is undefined."
Yes, but at least to me it's not really obvious this applies to all
operations - and at least the basic operations say "must be correctly
rounded", what does this even mean if the rounding mode isn't defined in
the first place? Would the rounding mode have to be consistent for all
operations, so, always "trunc" for all operations would be legal, albeit
looking "slightly suboptimal", whereas different rounding mode just for
conversion would be illegal?
In any case, I'd be happy with any clarifications in glsl no matter how
they end up...

Roland


> 
>> Roland
>>
>>> We've already determined that NVIDIA closed-source drivers do not flush
>>> to zero.  I'm curious to know what AMD's closed-source drivers do for
>>> 16-bit subnormal values supplied to unpackHalf2x16.  If they do not
>>> flush to zero, then you had better believe that applications depend on
>>> that behavior... and that also means that it doesn't matter very much
>>> what piglit does or the spec does (or does not) say.  This is the sort
>>> of situation where the spec changes to match application expectations
>>> and shipping implementations... and Mesa drivers change to follow.  This
>>> isn't even close to the first time through that loop.
>