[Mesa-dev] [PATCH 1/2] i965/fs: fix precision of f2b
Iago Toral
itoral at igalia.com
Fri Feb 26 10:25:18 UTC 2016
On Thu, 2016-02-25 at 18:20 -0500, Ilia Mirkin wrote:
> On Thu, Feb 25, 2016 at 6:16 PM, Francisco Jerez <currojerez at riseup.net> wrote:
> > Ian Romanick <idr at freedesktop.org> writes:
> >
> >> On 02/25/2016 12:13 PM, Francisco Jerez wrote:
> >>> Ian Romanick <idr at freedesktop.org> writes:
> >>>
> >>>> On 02/25/2016 08:46 AM, Roland Scheidegger wrote:
> >>>>> Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga:
> >>>>>> From the OpenGL 4.2 spec:
> >>>>>>
> >>>>>> "When a constructor is used to convert any integer or floating-point type to a
> >>>>>> bool, 0 and 0.0 are converted to false, and non-zero values are converted to
> >>>>>> true."
> >>>>>>
> >>>>>> Thus, even the smallest non-zero floating value should be translated to true.
> >>>>>> This behavior has been verified also with proprietary NVIDIA drivers.
> >>>>>>
> >>>>>> Currently, we implement this conversion as a cmp.nz operation with floats,
> >>>>>> subject to floating-point precision limitations, and as a result, relatively
> >>>>>> small non-zero floating point numbers return false instead of true.
> >>>>>>
> >>>>>> This patch fixes the problem by getting rid of the sign bit (to cover the case
> >>>>>> of -0.0) and testing the result against 0u using an integer comparison instead.
> >>>>>> ---
> >>>>>> src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
> >>>>>> 1 file changed, 12 insertions(+), 3 deletions(-)
> >>>>>>
> >>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >>>>>> index db20c71..7d62d7e 100644
> >>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >>>>>> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
> >>>>>> bld.MOV(result, negate(op[0]));
> >>>>>> break;
> >>>>>>
> >>>>>> - case nir_op_f2b:
> >>>>>> - bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
> >>>>>> - break;
> >>>>>> + case nir_op_f2b: {
> >>>>>> + /* Because comparing to 0.0f is subject to precision limitations, do the
> >>>>>> + * comparison using integers (we need to get rid of the sign bit for that)
> >>>>>> + */
> >>>>>> + if (devinfo->gen >= 8)
> >>>>>> + op[0] = resolve_source_modifiers(op[0]);
> >>>>>> + op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
> >>>>>> + bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
> >>>>>> + bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
> >>>>>> + break;
> >>>>>> + }
> >>>>>> +
> >>>>>> case nir_op_i2b:
> >>>>>> bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
> >>>>>> break;
> >>>>>>
> >>>>>
> >>>>> Does that fix anything? I don't really see a problem with the existing
> >>>>> logic. Yes any "non-zero value" should be converted to true. But surely
> >>>>> that definition cannot include denorms, which you are always allowed to
> >>>>> flush to zero.
> >>>>> (Albeit I can't tell what the result would be with NaNs with the float
> >>>>> compare, nor what the result actually should be in this case since glsl
> >>>>> doesn't require NaNs neither.)
> >>>>
> >>>> Based on this and Jason's comments, I think we need a bunch of new tests.
> >>>>
> >>>> - smallest positive normal number
> >>>> - abs of smallest positive normal number
> >>>> - neg of " " " "
> >>>> - largest positive subnormal number
> >>>> - abs of largest positive subnormal number
> >>>> - neg of " " " "
> >>>> - all of the above with negative numbers
> >>>> - NaN
> >>>> - abs of NaN
> >>>> - neg of "
> >>>>
> >>>> Perhaps others? +/-Inf just for kicks?
> >>>
> >>> What's the point? The result of most of the above (except possibly
> >>> bool(NaN)) is undefined by the spec:
> >>>
> >>> "Any denormalized value input into a shader or potentially generated by
> >>> any operation in a shader can be flushed to 0. [...] NaNs are not
> >>> required to be generated. [...] Operations and built-in functions that
> >>> operate on a NaN are not required to return a NaN as the result."
> >>
> >> Except that apparently one major OpenGL vendor does something well
> >> defined that's different than what we do.
> >
> > I'm skeptical that nVidia would treat single-precision denorms
> > inconsistently between datatype constructors and other floating-point
> > arithmetic, but assuming that's the case it would be an argument for
> > proposing the spec change to Khronos rather than introducing a dubiously
> > compliant change into the back-end. I think I would argue against
> > making such a change in the spec in any case, because even though it's
> > implementation-defined whether denorms are flushed or not, the following
> > is guaranteed by the spec AFAIUI:
> >
> > | if (bool(f))
> > | random_arithmetic_on(f /* Guaranteed not to be zero here even if
> > | denorms are flushed */);
> >
> > With this change in, bool(f) would evaluate to true even if f is a
> > denorm which is flushed to zero for all subsequent arithmetic in the
> > block. For that reason this seems more likely to break stuff than to
> > fix stuff, it's not like applications can expect denorms to be
> > representable at all regardless of what nVidia does, but they can expect
> > the above GLSL source to work.
>
> I'd be curious to see a shader test that triggers the relevant
> behaviour -- AFAIK nvidia always flushes denorms. The flush is a
> per-instruction setting though [since Fermi], so it's technically
> feasible not to flush sometimes and to flush other times.
This one:
https://github.com/Igalia/piglit/blob/arb_gpu_shader_fp64/tests/spec/arb_gpu_shader_fp64/execution/fs-conversion-explicit-double-bool-bad.shader_test
That test is for doubles. The author of the test told me that the
equivalent version for float also passed, but I have just verified that
he made a mistake. In summary: that test passes for double and fails for
float...
The quote from the spec shared by Curro says:
"Any denormalized value input into a shader or potentially generated by
any operation in a shader can be flushed to 0"
It says _can_ instead of _must_, so I share Curro's point of view that
this is in fact undefined behavior by the spec and f2b of a denorm could
return true or false and both are valid implementations. I wonder if
nvidia's case of flushing floats but not doubles is also acceptable
though, sounds like a bug to me.
In any case, based on this I think we can drop the patches.
Iago
More information about the mesa-dev
mailing list