[Mesa-dev] [PATCH v2] nir/spirv/glsl450: rewrite atan2 to deal with infinities

Sun Jan 22 08:20:52 UTC 2017

"Juan A. Suarez Romero" <jasuarez at igalia.com> writes:

> Rewrite atan2(y,x) to cover (+/-)INF values.
>
> This fixes several test cases in Vulkan CTS
> (dEQP-VK.glsl.builtin.precision.atan2.*)
>
> v2: do not flush denorms to 0 (jasuarez)
> ---
>  src/compiler/spirv/vtn_glsl450.c | 48 +++++++++++++++++++++++++++++++++++-----
>  1 file changed, 42 insertions(+), 6 deletions(-)
>
> diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c
> index 0d32fddbef..d52a22c0c3 100644
> --- a/src/compiler/spirv/vtn_glsl450.c
> +++ b/src/compiler/spirv/vtn_glsl450.c
> @@ -299,18 +299,47 @@ build_atan(nir_builder *b, nir_ssa_def *y_over_x)
>     return nir_fmul(b, tmp, nir_fsign(b, y_over_x));
>  }
>  
> +/*
> + * Computes atan2(y,x)
> + */
>  static nir_ssa_def *
>  build_atan2(nir_builder *b, nir_ssa_def *y, nir_ssa_def *x)
>  {
>     nir_ssa_def *zero = nir_imm_float(b, 0.0f);
> -
> -   /* If |x| >= 1.0e-8 * |y|: */
> -   nir_ssa_def *condition =
> -      nir_fge(b, nir_fabs(b, x),
> -              nir_fmul(b, nir_imm_float(b, 1.0e-8f), nir_fabs(b, y)));
> +   nir_ssa_def *inf = nir_imm_float(b, INFINITY);
> +   nir_ssa_def *minus_inf = nir_imm_float(b, -INFINITY);
> +   nir_ssa_def *m_3_pi_4 = nir_fmul(b, nir_imm_float(b, 3.0f),
> +                                       nir_imm_float(b, M_PI_4f));
> +
> +   /* if y == +-INF */
> +   nir_ssa_def *y_is_inf = nir_feq(b, nir_fabs(b, y), inf);
> +
> +   /* if x == +-INF */
> +   nir_ssa_def *x_is_inf = nir_feq(b, nir_fabs(b, x), inf);
> +
> +   /* Case: y is +-INF */
> +   nir_ssa_def *y_is_inf_then =
> +      nir_fmul(b, nir_fsign(b, y),
> +                  nir_bcsel(b, nir_feq(b, x, inf),
> +                               nir_imm_float(b, M_PI_4f),
> +                               nir_bcsel(b, nir_feq(b, x, minus_inf),
> +                                            m_3_pi_4,
> +                                            nir_imm_float(b, M_PI_2f))));
> +
> +   /* Case: x is +-INF */
> +   nir_ssa_def *x_is_inf_then =
> +      nir_fmul(b, nir_fsign(b, y),
> +                  nir_bcsel(b, nir_feq(b, x, inf),
> +                               zero,
> +                               nir_imm_float(b, M_PIf)));
> +

I don't think we need all these special cases.  The majority of the
infinity/zero handling rules required by IEEE are fairly natural and
would be taken care of without any additional effort by the
floating-point division operation and single-argument atan function
below if they propagated infinities and zeroes according to IEEE rules.

I had a look at the test results myself and noticed that the failures
are for the most part due to a precision problem in the current
implementation that doesn't only affect infinity -- Relative precision
also explodes as x grows above certain point, infinities just make the
problem catastrophic and cause it to return NaN instead of the
expected finite value.  The reason for the precision problem is that
fdiv is later on lowered into an fmul+frcp sequence, and the latter may
flush the result to zero if the denominator was so huge that its
reciprocal would be denormalized.  If the numerator happened to be
infinite you may end up with ∞/huge = NaN for the same reason.

On top of that there seem to be other issues with the current atan2
implementation:

 - It doesn't handle zeros correctly.  This may be related to your
   observation that denorm arguments cause it to give bogus results, but
   the problem doesn't seem to be related to denorms in particular, but
   to the fact that denorms can get flushed to -0 which is in turn
   handled incorrectly.  The reason is that the existing code uses 'y >=
   0' to determine on which side of the branch cut we are, but that
   causes the discontinuity to end up along the y=-epsilon line instead
   of along the y=0 line as IEEE requires -- IOW, with the current
   implementation very small negative y values behave as if they were
   positive which causes the result to have a large absolute error of
   2π.

 - It doesn't give IEEE-compliant results when both arguments are
   simultaneously infinite.  This is not surprising given that IEEE
   defining atan2(∞, ∞) = π/4 is fairly artificial (as are the other
   rules for combinations of positive or negative infinity), strictly
   speaking taking the limit along any direction other than the diagonal
   would be as right or wrong.  To make the matter worse IEEE disagrees
   with itself on the direction limits are taken when it goes on and
   defines e.g. atan2(+0, -0) = π taken along the horizontal.  Luckily
   GLSL specifically allows implementations to deviate from IEEE rules
   in a neighborhood of zero ("Results are undefined if x and y are both
   0."), so we don't need to care about that one.

Except for the last point, these issues seem serious enough to be worth
fixing -- I'll reply to this thread with an alternative implementation
of atan2 that addresses the first two issues (and actually uses less
instructions than the current implementation), plus another, optional
patch that addresses the third issue in order to make the CTS tests
happy (we can implement the whole oddball atan2(±∞, ±∞) corner cases as
IEEE says with only three additional Gen instructions by being bit
smart, but one could argue that Khronos should just make atan2(±∞, ±∞)
undefined since they're already deviating from IEEE-compliant behavior
at (±0, ±0)).

P.S.: Been waiting for hours to get jenkins results and it looks like
      the CI is busted, will send the patches once I get positive
      results -- Likely not today ;).

> +   /* If x > 0 */
> +   nir_ssa_def *x_not_zero =
> +      nir_fne(b, x, zero);
>  
>     /* Then...call atan(y/x) and fix it up: */
>     nir_ssa_def *atan1 = build_atan(b, nir_fdiv(b, y, x));
> +
>     nir_ssa_def *r_then =
>        nir_bcsel(b, nir_flt(b, x, zero),
>                     nir_fadd(b, atan1,
> @@ -323,7 +352,14 @@ build_atan2(nir_builder *b, nir_ssa_def *y, nir_ssa_def *x)
>     nir_ssa_def *r_else =
>        nir_fmul(b, nir_fsign(b, y), nir_imm_float(b, M_PI_2f));
>  
> -   return nir_bcsel(b, condition, r_then, r_else);
> +   /* Everything together */
> +   return nir_bcsel(b, y_is_inf,
> +                       y_is_inf_then,
> +                       nir_bcsel(b, x_is_inf,
> +                                    x_is_inf_then,
> +                                    nir_bcsel(b, x_not_zero,
> +                                                 r_then,
> +                                                 r_else)));
>  }
>  
>  static nir_ssa_def *
> -- 
> 2.11.0
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170122/72cc5131/attachment.sig>