[Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

Mon Feb 8 23:02:41 UTC 2016

On 02/08/2016 12:38 PM, Marek Olšák wrote:
> On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellard <tom at stellard.net> wrote:
>> On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote:
>>> From: Marek Olšák <marek.olsak at amd.com>
>>>
>>> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
>>> but not SI & CI, which can't disable denorms for those instructions.
>>
>> Do you know why this fixes FP16 conversions?  What does the OpenGL
>> spec say about denormal handing?
> 
> Yes, I know why. The patch explain everything as far as I can see
> though. What isn't clear?
> 
> SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit
> and accept FP16 denormals.
> VI: Supports FP16. FP16 denormal support is now configurable and
> affects FP16 conversions as well.(shared setting with FP64).
> 
> OpenGL doesn't require denormals. Piglit does. I think this is
> incorrect piglit behavior.

I submitted a public spec bug for this issue:

https://www.khronos.org/bugzilla/show_bug.cgi?id=1460

I'm investigating whether a similar bug is needed for the SPIR-V
specification.

I think an argument can be made for either the flush-to-zero or
non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly)
packHalf2x16.  The only place in the GLSL 4.50.5 specification that
mentions subnormal values is section 4.7.1 (Range and Precision).

    "The precision of stored single- and double-precision floating-point
    variables is defined by the IEEE 754 standard for 32-bit and 64-bit
    floating-point numbers....Any denormalized value input into a
    shader or potentially generated by any operation in a shader can be
    flushed to 0."

Since there is no half-precision type in desktop GLSL, there is no
mention of 16-bit subnormal values.  As Roland mentioned before, all
16-bit subnormal values values are 32-bit normal values.

As I mentioned before, from the point of view of an application
developer, the flush-to-zero behavior for unpackHalf2x16 is both
surprising and awful. :)

While I think an argument can be made for either behavior, I also think
the argument for the non-flush-to-zero behavior is slightly stronger.
The case for flush-to-zero based on the above spec quotation fails for
two reasons.  First, the "input into [the] shader" is not a subnormal
number.  It is an integer.  Second, the "[value] potentially generated
by [the] operation" is not subnormal in single-precision.

We've already determined that NVIDIA closed-source drivers do not flush
to zero.  I'm curious to know what AMD's closed-source drivers do for
16-bit subnormal values supplied to unpackHalf2x16.  If they do not
flush to zero, then you had better believe that applications depend on
that behavior... and that also means that it doesn't matter very much
what piglit does or the spec does (or does not) say.  This is the sort
of situation where the spec changes to match application expectations
and shipping implementations... and Mesa drivers change to follow.  This
isn't even close to the first time through that loop.

>>> ---
>>>  src/gallium/drivers/radeonsi/si_shader.c        | 14 ++++++++++++++
>>>  src/gallium/drivers/radeonsi/si_state_shaders.c | 18 ++++++++++++------
>>>  src/gallium/drivers/radeonsi/sid.h              |  3 +++
>>>  3 files changed, 29 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c
>>> index a4680ce..3f1db70 100644
>>> --- a/src/gallium/drivers/radeonsi/si_shader.c
>>> +++ b/src/gallium/drivers/radeonsi/si_shader.c
>>> @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen,
>>>
>>>       si_shader_binary_read_config(binary, conf, 0);
>>>
>>> +     /* Enable 64-bit and 16-bit denormals, because there is no performance
>>> +      * cost.
>>> +      *
>>> +      * If denormals are enabled, all floating-point output modifiers are
>>> +      * ignored.
>>> +      *
>>> +      * Don't enable denormals for 32-bit floats, because:
>>> +      * - Floating-point output modifiers would be ignored by the hw.
>>> +      * - Some opcodes don't support denormals, such as v_mad_f32. We would
>>> +      *   have to stop using those.
>>> +      * - SI & CI would be very slow.
>>> +      */
>>> +     conf->float_mode |= V_00B028_FP_64_DENORMS;
>>> +
>>
>> Do SI/CI support fp64 denorms?  If so, won't this hurt performance?
> 
> Yes, they do. Fp64 denorms don't hurt performance. Only fp32 denorms
> do on SI & CI.
> 
>>
>> We should tell the compiler we are enabling fp-64 denorms by adding
>> +fp64-denormals to the feature string.  It would also be better to
>> read the float_mode value from the config registers emitted by the
>> compiler.
> 
> Yes, I agree, but LLVM only sets these parameters for compute or even
> HSA-only kernels, not for graphics shaders. We need to set the mode
> for all users _now_, not in 6 months. Last time I looked,
> +fp64-denormals had no effect on graphics shaders.
> 
> Marek
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>