[Mesa-dev] [PATCH 1/2] i965/fs: fix precision of f2b

Thu Feb 25 23:20:08 UTC 2016

On Thu, Feb 25, 2016 at 6:16 PM, Francisco Jerez <currojerez at riseup.net> wrote:
> Ian Romanick <idr at freedesktop.org> writes:
>
>> On 02/25/2016 12:13 PM, Francisco Jerez wrote:
>>> Ian Romanick <idr at freedesktop.org> writes:
>>>
>>>> On 02/25/2016 08:46 AM, Roland Scheidegger wrote:
>>>>> Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga:
>>>>>> From the OpenGL 4.2 spec:
>>>>>>
>>>>>> "When a constructor is used to convert any integer or floating-point type to a
>>>>>>  bool, 0 and 0.0 are converted to false, and non-zero values are converted to
>>>>>>  true."
>>>>>>
>>>>>> Thus, even the smallest non-zero floating value should be translated to true.
>>>>>> This behavior has been verified also with proprietary NVIDIA drivers.
>>>>>>
>>>>>> Currently, we implement this conversion as a cmp.nz operation with floats,
>>>>>> subject to floating-point precision limitations, and as a result, relatively
>>>>>> small non-zero floating point numbers return false instead of true.
>>>>>>
>>>>>> This patch fixes the problem by getting rid of the sign bit (to cover the case
>>>>>> of -0.0) and testing the result against 0u using an integer comparison instead.
>>>>>> ---
>>>>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
>>>>>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>>>> index db20c71..7d62d7e 100644
>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>>>> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
>>>>>>        bld.MOV(result, negate(op[0]));
>>>>>>        break;
>>>>>>
>>>>>> -   case nir_op_f2b:
>>>>>> -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
>>>>>> -      break;
>>>>>> +   case nir_op_f2b: {
>>>>>> +      /* Because comparing to 0.0f is subject to precision limitations, do the
>>>>>> +       * comparison using integers (we need to get rid of the sign bit for that)
>>>>>> +       */
>>>>>> +      if (devinfo->gen >= 8)
>>>>>> +         op[0] = resolve_source_modifiers(op[0]);
>>>>>> +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
>>>>>> +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
>>>>>> +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
>>>>>> +       break;
>>>>>> +   }
>>>>>> +
>>>>>>     case nir_op_i2b:
>>>>>>        bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
>>>>>>        break;
>>>>>>
>>>>>
>>>>> Does that fix anything? I don't really see a problem with the existing
>>>>> logic. Yes any "non-zero value" should be converted to true. But surely
>>>>> that definition cannot include denorms, which you are always allowed to
>>>>> flush to zero.
>>>>> (Albeit I can't tell what the result would be with NaNs with the float
>>>>> compare, nor what the result actually should be in this case since glsl
>>>>> doesn't require NaNs neither.)
>>>>
>>>> Based on this and Jason's comments, I think we need a bunch of new tests.
>>>>
>>>>  - smallest positive normal number
>>>>  - abs of smallest positive normal number
>>>>  - neg of     "       "        "      "
>>>>  - largest positive subnormal number
>>>>  - abs of largest positive subnormal number
>>>>  - neg of    "        "        "        "
>>>>  - all of the above with negative numbers
>>>>  - NaN
>>>>  - abs of NaN
>>>>  - neg of  "
>>>>
>>>> Perhaps others? +/-Inf just for kicks?
>>>
>>> What's the point?  The result of most of the above (except possibly
>>> bool(NaN)) is undefined by the spec:
>>>
>>> "Any denormalized value input into a shader or potentially generated by
>>>  any operation in a shader can be flushed to 0. [...] NaNs are not
>>>  required to be generated. [...] Operations and built-in functions that
>>>  operate on a NaN are not required to return a NaN as the result."
>>
>> Except that apparently one major OpenGL vendor does something well
>> defined that's different than what we do.
>
> I'm skeptical that nVidia would treat single-precision denorms
> inconsistently between datatype constructors and other floating-point
> arithmetic, but assuming that's the case it would be an argument for
> proposing the spec change to Khronos rather than introducing a dubiously
> compliant change into the back-end.  I think I would argue against
> making such a change in the spec in any case, because even though it's
> implementation-defined whether denorms are flushed or not, the following
> is guaranteed by the spec AFAIUI:
>
> | if (bool(f))
> |    random_arithmetic_on(f /* Guaranteed not to be zero here even if
> |                              denorms are flushed */);
>
> With this change in, bool(f) would evaluate to true even if f is a
> denorm which is flushed to zero for all subsequent arithmetic in the
> block.  For that reason this seems more likely to break stuff than to
> fix stuff, it's not like applications can expect denorms to be
> representable at all regardless of what nVidia does, but they can expect
> the above GLSL source to work.

I'd be curious to see a shader test that triggers the relevant
behaviour -- AFAIK nvidia always flushes denorms. The flush is a
per-instruction setting though [since Fermi], so it's technically
feasible not to flush sometimes and to flush other times.

  -ilia