[Mesa-dev] [PATCH 1/2] i965/fs: fix precision of f2b

Iago Toral itoral at igalia.com
Fri Feb 26 15:39:45 UTC 2016


On Fri, 2016-02-26 at 16:28 +0100, Roland Scheidegger wrote:
> Am 26.02.2016 um 11:25 schrieb Iago Toral:
> > 
> > On Thu, 2016-02-25 at 18:20 -0500, Ilia Mirkin wrote:
> >> On Thu, Feb 25, 2016 at 6:16 PM, Francisco Jerez <currojerez at riseup.net> wrote:
> >>> Ian Romanick <idr at freedesktop.org> writes:
> >>>
> >>>> On 02/25/2016 12:13 PM, Francisco Jerez wrote:
> >>>>> Ian Romanick <idr at freedesktop.org> writes:
> >>>>>
> >>>>>> On 02/25/2016 08:46 AM, Roland Scheidegger wrote:
> >>>>>>> Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga:
> >>>>>>>> From the OpenGL 4.2 spec:
> >>>>>>>>
> >>>>>>>> "When a constructor is used to convert any integer or floating-point type to a
> >>>>>>>>  bool, 0 and 0.0 are converted to false, and non-zero values are converted to
> >>>>>>>>  true."
> >>>>>>>>
> >>>>>>>> Thus, even the smallest non-zero floating value should be translated to true.
> >>>>>>>> This behavior has been verified also with proprietary NVIDIA drivers.
> >>>>>>>>
> >>>>>>>> Currently, we implement this conversion as a cmp.nz operation with floats,
> >>>>>>>> subject to floating-point precision limitations, and as a result, relatively
> >>>>>>>> small non-zero floating point numbers return false instead of true.
> >>>>>>>>
> >>>>>>>> This patch fixes the problem by getting rid of the sign bit (to cover the case
> >>>>>>>> of -0.0) and testing the result against 0u using an integer comparison instead.
> >>>>>>>> ---
> >>>>>>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
> >>>>>>>>  1 file changed, 12 insertions(+), 3 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >>>>>>>> index db20c71..7d62d7e 100644
> >>>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >>>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >>>>>>>> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
> >>>>>>>>        bld.MOV(result, negate(op[0]));
> >>>>>>>>        break;
> >>>>>>>>
> >>>>>>>> -   case nir_op_f2b:
> >>>>>>>> -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
> >>>>>>>> -      break;
> >>>>>>>> +   case nir_op_f2b: {
> >>>>>>>> +      /* Because comparing to 0.0f is subject to precision limitations, do the
> >>>>>>>> +       * comparison using integers (we need to get rid of the sign bit for that)
> >>>>>>>> +       */
> >>>>>>>> +      if (devinfo->gen >= 8)
> >>>>>>>> +         op[0] = resolve_source_modifiers(op[0]);
> >>>>>>>> +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
> >>>>>>>> +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
> >>>>>>>> +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
> >>>>>>>> +       break;
> >>>>>>>> +   }
> >>>>>>>> +
> >>>>>>>>     case nir_op_i2b:
> >>>>>>>>        bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
> >>>>>>>>        break;
> >>>>>>>>
> >>>>>>>
> >>>>>>> Does that fix anything? I don't really see a problem with the existing
> >>>>>>> logic. Yes any "non-zero value" should be converted to true. But surely
> >>>>>>> that definition cannot include denorms, which you are always allowed to
> >>>>>>> flush to zero.
> >>>>>>> (Albeit I can't tell what the result would be with NaNs with the float
> >>>>>>> compare, nor what the result actually should be in this case since glsl
> >>>>>>> doesn't require NaNs neither.)
> >>>>>>
> >>>>>> Based on this and Jason's comments, I think we need a bunch of new tests.
> >>>>>>
> >>>>>>  - smallest positive normal number
> >>>>>>  - abs of smallest positive normal number
> >>>>>>  - neg of     "       "        "      "
> >>>>>>  - largest positive subnormal number
> >>>>>>  - abs of largest positive subnormal number
> >>>>>>  - neg of    "        "        "        "
> >>>>>>  - all of the above with negative numbers
> >>>>>>  - NaN
> >>>>>>  - abs of NaN
> >>>>>>  - neg of  "
> >>>>>>
> >>>>>> Perhaps others? +/-Inf just for kicks?
> >>>>>
> >>>>> What's the point?  The result of most of the above (except possibly
> >>>>> bool(NaN)) is undefined by the spec:
> >>>>>
> >>>>> "Any denormalized value input into a shader or potentially generated by
> >>>>>  any operation in a shader can be flushed to 0. [...] NaNs are not
> >>>>>  required to be generated. [...] Operations and built-in functions that
> >>>>>  operate on a NaN are not required to return a NaN as the result."
> >>>>
> >>>> Except that apparently one major OpenGL vendor does something well
> >>>> defined that's different than what we do.
> >>>
> >>> I'm skeptical that nVidia would treat single-precision denorms
> >>> inconsistently between datatype constructors and other floating-point
> >>> arithmetic, but assuming that's the case it would be an argument for
> >>> proposing the spec change to Khronos rather than introducing a dubiously
> >>> compliant change into the back-end.  I think I would argue against
> >>> making such a change in the spec in any case, because even though it's
> >>> implementation-defined whether denorms are flushed or not, the following
> >>> is guaranteed by the spec AFAIUI:
> >>>
> >>> | if (bool(f))
> >>> |    random_arithmetic_on(f /* Guaranteed not to be zero here even if
> >>> |                              denorms are flushed */);
> >>>
> >>> With this change in, bool(f) would evaluate to true even if f is a
> >>> denorm which is flushed to zero for all subsequent arithmetic in the
> >>> block.  For that reason this seems more likely to break stuff than to
> >>> fix stuff, it's not like applications can expect denorms to be
> >>> representable at all regardless of what nVidia does, but they can expect
> >>> the above GLSL source to work.
> >>
> >> I'd be curious to see a shader test that triggers the relevant
> >> behaviour -- AFAIK nvidia always flushes denorms. The flush is a
> >> per-instruction setting though [since Fermi], so it's technically
> >> feasible not to flush sometimes and to flush other times.
> > 
> > This one:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Igalia_piglit_blob_arb-5Fgpu-5Fshader-5Ffp64_tests_spec_arb-5Fgpu-5Fshader-5Ffp64_execution_fs-2Dconversion-2Dexplicit-2Ddouble-2Dbool-2Dbad.shader-5Ftest&d=BQICaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=K5Q85JZWVAJuljMHdoEb9NzJVH29wma5HWxq4K5UNs0&s=knFllQfBVoab6tHDLvgEdZUWb7WW77hBt62DQT6Co3s&e= 
> > 
> > That test is for doubles. The author of the test told me that the
> > equivalent version for float also passed, but I have just verified that
> > he made a mistake. In summary: that test passes for double and fails for
> > float... 
> > 
> > The quote from the spec shared by Curro says:
> > 
> > "Any denormalized value input into a shader or potentially generated by
> >  any operation in a shader can be flushed to 0"
> > 
> > It says _can_ instead of _must_, so I share Curro's point of view that
> > this is in fact undefined behavior by the spec and f2b of a denorm could
> > return true or false and both are valid implementations. I wonder if
> > nvidia's case of flushing floats but not doubles is also acceptable
> > though, sounds like a bug to me.
> 
> GL is always "can" for such things, and never "must" :-).
> (d3d10 would require "must be flushed" for floats.)
> Flushing denorms for floats but not for doubles looks perfectly fine to
> me too, there's quite some hw which can't do denorms for floats (at
> least not without compromises) but can do it for doubles (because
> doubles had a low rate and were only useful for HPC, whereas floats need
> to favor performance over everything else, essentially).

Ah, interesting, thanks for sharing this!

In that case I wonder what we should do for doubles in the case of i965.
I have patches for both implementations, so which one do we want?

Iago



More information about the mesa-dev mailing list