[Mesa-dev] [PATCH v2] nir/spirv/glsl450: rewrite atan2 to deal with infinities
Francisco Jerez
currojerez at riseup.net
Mon Jan 23 18:59:43 UTC 2017
"Juan A. Suarez Romero" <jasuarez at igalia.com> writes:
> On Sun, 2017-01-22 at 00:20 -0800, Francisco Jerez wrote:
>> "Juan A. Suarez Romero" <jasuarez at igalia.com> writes:
>>
>> > Rewrite atan2(y,x) to cover (+/-)INF values.
>> >
>> > This fixes several test cases in Vulkan CTS
>> > (dEQP-VK.glsl.builtin.precision.atan2.*)
>> >
>> > v2: do not flush denorms to 0 (jasuarez)
>> > ---
>> > src/compiler/spirv/vtn_glsl450.c | 48 +++++++++++++++++++++++++++++++++++-----
>> > 1 file changed, 42 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c
>> > index 0d32fddbef..d52a22c0c3 100644
>> > --- a/src/compiler/spirv/vtn_glsl450.c
>> > +++ b/src/compiler/spirv/vtn_glsl450.c
>> > @@ -299,18 +299,47 @@ build_atan(nir_builder *b, nir_ssa_def *y_over_x)
>> > return nir_fmul(b, tmp, nir_fsign(b, y_over_x));
>> > }
>> >
>> > +/*
>> > + * Computes atan2(y,x)
>> > + */
>> > static nir_ssa_def *
>> > build_atan2(nir_builder *b, nir_ssa_def *y, nir_ssa_def *x)
>> > {
>> > nir_ssa_def *zero = nir_imm_float(b, 0.0f);
>> > -
>> > - /* If |x| >= 1.0e-8 * |y|: */
>> > - nir_ssa_def *condition =
>> > - nir_fge(b, nir_fabs(b, x),
>> > - nir_fmul(b, nir_imm_float(b, 1.0e-8f), nir_fabs(b, y)));
>> > + nir_ssa_def *inf = nir_imm_float(b, INFINITY);
>> > + nir_ssa_def *minus_inf = nir_imm_float(b, -INFINITY);
>> > + nir_ssa_def *m_3_pi_4 = nir_fmul(b, nir_imm_float(b, 3.0f),
>> > + nir_imm_float(b, M_PI_4f));
>> > +
>> > + /* if y == +-INF */
>> > + nir_ssa_def *y_is_inf = nir_feq(b, nir_fabs(b, y), inf);
>> > +
>> > + /* if x == +-INF */
>> > + nir_ssa_def *x_is_inf = nir_feq(b, nir_fabs(b, x), inf);
>> > +
>> > + /* Case: y is +-INF */
>> > + nir_ssa_def *y_is_inf_then =
>> > + nir_fmul(b, nir_fsign(b, y),
>> > + nir_bcsel(b, nir_feq(b, x, inf),
>> > + nir_imm_float(b, M_PI_4f),
>> > + nir_bcsel(b, nir_feq(b, x, minus_inf),
>> > + m_3_pi_4,
>> > + nir_imm_float(b, M_PI_2f))));
>> > +
>> > + /* Case: x is +-INF */
>> > + nir_ssa_def *x_is_inf_then =
>> > + nir_fmul(b, nir_fsign(b, y),
>> > + nir_bcsel(b, nir_feq(b, x, inf),
>> > + zero,
>> > + nir_imm_float(b, M_PIf)));
>> > +
>>
>> I don't think we need all these special cases. The majority of the
>> infinity/zero handling rules required by IEEE are fairly natural and
>> would be taken care of without any additional effort by the
>> floating-point division operation and single-argument atan function
>> below if they propagated infinities and zeroes according to IEEE rules.
>>
>> I had a look at the test results myself and noticed that the failures
>> are for the most part due to a precision problem in the current
>> implementation that doesn't only affect infinity -- Relative precision
>> also explodes as x grows above certain point, infinities just make the
>> problem catastrophic and cause it to return NaN instead of the
>> expected finite value. The reason for the precision problem is that
>> fdiv is later on lowered into an fmul+frcp sequence, and the latter may
>> flush the result to zero if the denominator was so huge that its
>> reciprocal would be denormalized. If the numerator happened to be
>> infinite you may end up with ∞/huge = NaN for the same reason.
>>
>
> Right. For this case I'd submitted a patch to the test itself, that
> roughly speaking assumes any result as possible if denominator is big
> enough.
>
> https://gerrit.khronos.org/#/c/524/
>
>
> I understand with your alternative proposal you would also handle this
> case correctly, making the CTS change not required, right?
>
Yes, I think the CTS had found a legitimate bug in our atan2
implementation, patching it only conceals the problem -- Granted that
trigonometric functions have unspecified precision according to the GLSL
spec [so you could argue that the majority of these tests shouldn't even
exist in the first place ;)], but the result was over 8 million ULP off
for a range of inputs which seems a bit over the top. With my atan2
implementation the related CTS tests pass without any changes.
>
>> On top of that there seem to be other issues with the current atan2
>> implementation:
>>
>> - It doesn't handle zeros correctly. This may be related to your
>> observation that denorm arguments cause it to give bogus results, but
>> the problem doesn't seem to be related to denorms in particular, but
>> to the fact that denorms can get flushed to -0 which is in turn
>> handled incorrectly. The reason is that the existing code uses 'y >=
>> 0' to determine on which side of the branch cut we are, but that
>> causes the discontinuity to end up along the y=-epsilon line instead
>> of along the y=0 line as IEEE requires -- IOW, with the current
>> implementation very small negative y values behave as if they were
>> positive which causes the result to have a large absolute error of
>> 2π.
>>
>> - It doesn't give IEEE-compliant results when both arguments are
>> simultaneously infinite. This is not surprising given that IEEE
>> defining atan2(∞, ∞) = π/4 is fairly artificial (as are the other
>> rules for combinations of positive or negative infinity), strictly
>> speaking taking the limit along any direction other than the diagonal
>> would be as right or wrong. To make the matter worse IEEE disagrees
>> with itself on the direction limits are taken when it goes on and
>> defines e.g. atan2(+0, -0) = π taken along the horizontal. Luckily
>> GLSL specifically allows implementations to deviate from IEEE rules
>> in a neighborhood of zero ("Results are undefined if x and y are both
>> 0."), so we don't need to care about that one.
>>
>> Except for the last point, these issues seem serious enough to be worth
>> fixing -- I'll reply to this thread with an alternative implementation
>> of atan2 that addresses the first two issues (and actually uses less
>> instructions than the current implementation), plus another, optional
>> patch that addresses the third issue in order to make the CTS tests
>> happy (we can implement the whole oddball atan2(±∞, ±∞) corner cases as
>> IEEE says with only three additional Gen instructions by being bit
>> smart, but one could argue that Khronos should just make atan2(±∞, ±∞)
>> undefined since they're already deviating from IEEE-compliant behavior
>> at (±0, ±0)).
>>
>> P.S.: Been waiting for hours to get jenkins results and it looks like
>> the CI is busted, will send the patches once I get positive
>> results -- Likely not today ;).
>>
>> > + /* If x > 0 */
>> > + nir_ssa_def *x_not_zero =
>> > + nir_fne(b, x, zero);
>> >
>> > /* Then...call atan(y/x) and fix it up: */
>> > nir_ssa_def *atan1 = build_atan(b, nir_fdiv(b, y, x));
>> > +
>> > nir_ssa_def *r_then =
>> > nir_bcsel(b, nir_flt(b, x, zero),
>> > nir_fadd(b, atan1,
>> > @@ -323,7 +352,14 @@ build_atan2(nir_builder *b, nir_ssa_def *y, nir_ssa_def *x)
>> > nir_ssa_def *r_else =
>> > nir_fmul(b, nir_fsign(b, y), nir_imm_float(b, M_PI_2f));
>> >
>> > - return nir_bcsel(b, condition, r_then, r_else);
>> > + /* Everything together */
>> > + return nir_bcsel(b, y_is_inf,
>> > + y_is_inf_then,
>> > + nir_bcsel(b, x_is_inf,
>> > + x_is_inf_then,
>> > + nir_bcsel(b, x_not_zero,
>> > + r_then,
>> > + r_else)));
>> > }
>> >
>> > static nir_ssa_def *
>> > --
>> > 2.11.0
>> >
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev at lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170123/ece0996d/attachment-0001.sig>
More information about the mesa-dev
mailing list