[Mesa-dev] [PATCH 1/2] i965/fs: fix precision of f2b

Fri Feb 26 16:13:46 UTC 2016

On Fri, Feb 26, 2016 at 10:39 AM, Iago Toral <itoral at igalia.com> wrote:
> On Fri, 2016-02-26 at 16:28 +0100, Roland Scheidegger wrote:
>> Am 26.02.2016 um 11:25 schrieb Iago Toral:
>> >
>> > On Thu, 2016-02-25 at 18:20 -0500, Ilia Mirkin wrote:
>> >> On Thu, Feb 25, 2016 at 6:16 PM, Francisco Jerez <currojerez at riseup.net> wrote:
>> >>> Ian Romanick <idr at freedesktop.org> writes:
>> >>>
>> >>>> On 02/25/2016 12:13 PM, Francisco Jerez wrote:
>> >>>>> Ian Romanick <idr at freedesktop.org> writes:
>> >>>>>
>> >>>>>> On 02/25/2016 08:46 AM, Roland Scheidegger wrote:
>> >>>>>>> Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga:
>> >>>>>>>> From the OpenGL 4.2 spec:
>> >>>>>>>>
>> >>>>>>>> "When a constructor is used to convert any integer or floating-point type to a
>> >>>>>>>>  bool, 0 and 0.0 are converted to false, and non-zero values are converted to
>> >>>>>>>>  true."
>> >>>>>>>>
>> >>>>>>>> Thus, even the smallest non-zero floating value should be translated to true.
>> >>>>>>>> This behavior has been verified also with proprietary NVIDIA drivers.
>> >>>>>>>>
>> >>>>>>>> Currently, we implement this conversion as a cmp.nz operation with floats,
>> >>>>>>>> subject to floating-point precision limitations, and as a result, relatively
>> >>>>>>>> small non-zero floating point numbers return false instead of true.
>> >>>>>>>>
>> >>>>>>>> This patch fixes the problem by getting rid of the sign bit (to cover the case
>> >>>>>>>> of -0.0) and testing the result against 0u using an integer comparison instead.
>> >>>>>>>> ---
>> >>>>>>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
>> >>>>>>>>  1 file changed, 12 insertions(+), 3 deletions(-)
>> >>>>>>>>
>> >>>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >>>>>>>> index db20c71..7d62d7e 100644
>> >>>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >>>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >>>>>>>> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
>> >>>>>>>>        bld.MOV(result, negate(op[0]));
>> >>>>>>>>        break;
>> >>>>>>>>
>> >>>>>>>> -   case nir_op_f2b:
>> >>>>>>>> -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
>> >>>>>>>> -      break;
>> >>>>>>>> +   case nir_op_f2b: {
>> >>>>>>>> +      /* Because comparing to 0.0f is subject to precision limitations, do the
>> >>>>>>>> +       * comparison using integers (we need to get rid of the sign bit for that)
>> >>>>>>>> +       */
>> >>>>>>>> +      if (devinfo->gen >= 8)
>> >>>>>>>> +         op[0] = resolve_source_modifiers(op[0]);
>> >>>>>>>> +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
>> >>>>>>>> +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
>> >>>>>>>> +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
>> >>>>>>>> +       break;
>> >>>>>>>> +   }
>> >>>>>>>> +
>> >>>>>>>>     case nir_op_i2b:
>> >>>>>>>>        bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
>> >>>>>>>>        break;
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> Does that fix anything? I don't really see a problem with the existing
>> >>>>>>> logic. Yes any "non-zero value" should be converted to true. But surely
>> >>>>>>> that definition cannot include denorms, which you are always allowed to
>> >>>>>>> flush to zero.
>> >>>>>>> (Albeit I can't tell what the result would be with NaNs with the float
>> >>>>>>> compare, nor what the result actually should be in this case since glsl
>> >>>>>>> doesn't require NaNs neither.)
>> >>>>>>
>> >>>>>> Based on this and Jason's comments, I think we need a bunch of new tests.
>> >>>>>>
>> >>>>>>  - smallest positive normal number
>> >>>>>>  - abs of smallest positive normal number
>> >>>>>>  - neg of     "       "        "      "
>> >>>>>>  - largest positive subnormal number
>> >>>>>>  - abs of largest positive subnormal number
>> >>>>>>  - neg of    "        "        "        "
>> >>>>>>  - all of the above with negative numbers
>> >>>>>>  - NaN
>> >>>>>>  - abs of NaN
>> >>>>>>  - neg of  "
>> >>>>>>
>> >>>>>> Perhaps others? +/-Inf just for kicks?
>> >>>>>
>> >>>>> What's the point?  The result of most of the above (except possibly
>> >>>>> bool(NaN)) is undefined by the spec:
>> >>>>>
>> >>>>> "Any denormalized value input into a shader or potentially generated by
>> >>>>>  any operation in a shader can be flushed to 0. [...] NaNs are not
>> >>>>>  required to be generated. [...] Operations and built-in functions that
>> >>>>>  operate on a NaN are not required to return a NaN as the result."
>> >>>>
>> >>>> Except that apparently one major OpenGL vendor does something well
>> >>>> defined that's different than what we do.
>> >>>
>> >>> I'm skeptical that nVidia would treat single-precision denorms
>> >>> inconsistently between datatype constructors and other floating-point
>> >>> arithmetic, but assuming that's the case it would be an argument for
>> >>> proposing the spec change to Khronos rather than introducing a dubiously
>> >>> compliant change into the back-end.  I think I would argue against
>> >>> making such a change in the spec in any case, because even though it's
>> >>> implementation-defined whether denorms are flushed or not, the following
>> >>> is guaranteed by the spec AFAIUI:
>> >>>
>> >>> | if (bool(f))
>> >>> |    random_arithmetic_on(f /* Guaranteed not to be zero here even if
>> >>> |                              denorms are flushed */);
>> >>>
>> >>> With this change in, bool(f) would evaluate to true even if f is a
>> >>> denorm which is flushed to zero for all subsequent arithmetic in the
>> >>> block.  For that reason this seems more likely to break stuff than to
>> >>> fix stuff, it's not like applications can expect denorms to be
>> >>> representable at all regardless of what nVidia does, but they can expect
>> >>> the above GLSL source to work.
>> >>
>> >> I'd be curious to see a shader test that triggers the relevant
>> >> behaviour -- AFAIK nvidia always flushes denorms. The flush is a
>> >> per-instruction setting though [since Fermi], so it's technically
>> >> feasible not to flush sometimes and to flush other times.
>> >
>> > This one:
>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Igalia_piglit_blob_arb-5Fgpu-5Fshader-5Ffp64_tests_spec_arb-5Fgpu-5Fshader-5Ffp64_execution_fs-2Dconversion-2Dexplicit-2Ddouble-2Dbool-2Dbad.shader-5Ftest&d=BQICaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=K5Q85JZWVAJuljMHdoEb9NzJVH29wma5HWxq4K5UNs0&s=knFllQfBVoab6tHDLvgEdZUWb7WW77hBt62DQT6Co3s&e=
>> >
>> > That test is for doubles. The author of the test told me that the
>> > equivalent version for float also passed, but I have just verified that
>> > he made a mistake. In summary: that test passes for double and fails for
>> > float...
>> >
>> > The quote from the spec shared by Curro says:
>> >
>> > "Any denormalized value input into a shader or potentially generated by
>> >  any operation in a shader can be flushed to 0"
>> >
>> > It says _can_ instead of _must_, so I share Curro's point of view that
>> > this is in fact undefined behavior by the spec and f2b of a denorm could
>> > return true or false and both are valid implementations. I wonder if
>> > nvidia's case of flushing floats but not doubles is also acceptable
>> > though, sounds like a bug to me.
>>
>> GL is always "can" for such things, and never "must" :-).
>> (d3d10 would require "must be flushed" for floats.)
>> Flushing denorms for floats but not for doubles looks perfectly fine to
>> me too, there's quite some hw which can't do denorms for floats (at
>> least not without compromises) but can do it for doubles (because
>> doubles had a low rate and were only useful for HPC, whereas floats need
>> to favor performance over everything else, essentially).
>
> Ah, interesting, thanks for sharing this!
>
> In that case I wonder what we should do for doubles in the case of i965.
> I have patches for both implementations, so which one do we want?

For GLSL it doesn't really matter, I'd go with whatever way is simpler
to implement.

>
> Iago
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev