[Mesa-dev] [PATCH 09/59] compiler/spirv: use 32-bit polynomial approximation for 16-bit asin()

Fri Dec 7 15:26:14 UTC 2018

On Tue, Dec 4, 2018 at 1:18 AM Iago Toral Quiroga <itoral at igalia.com> wrote:

> The 16-bit polynomial execution doesn't meet Khronos precision
> requirements.
> Also, the half-float denorm range starts at 2^(-14) and with asin taking
> input
> values in the range [0, 1], polynomial approximations can lead to flushing
> relatively easy.
>
> An alternative is to use the atan2 formula to compute asin, which is the
> reference taken by Khronos to determine precision requirements, but that
> ends up generating too many additional instructions when compared to the
> polynomial approximation. Specifically, for the Intel case, doing this
> adds +41 instructions to the program for each asin/acos call, which looks
> like an undesirable trade off.
>
> So for now we take the easy way out and fallback to using the 32-bit
> polynomial approximation, which is better (faster) than the 16-bit atan2
> implementation and gives us better precision that matches Khronos
> requirements.
> ---
>  src/compiler/spirv/vtn_glsl450.c | 21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/src/compiler/spirv/vtn_glsl450.c
> b/src/compiler/spirv/vtn_glsl450.c
> index bb340c87416..64a1431ae14 100644
> --- a/src/compiler/spirv/vtn_glsl450.c
> +++ b/src/compiler/spirv/vtn_glsl450.c
> @@ -201,8 +201,20 @@ build_log(nir_builder *b, nir_ssa_def *x)
>   * in each case.
>   */
>  static nir_ssa_def *
> -build_asin(nir_builder *b, nir_ssa_def *x, float _p0, float _p1)
> +build_asin(nir_builder *b, nir_ssa_def *_x, float _p0, float _p1)
>  {
> +   /* The polynomial approximation isn't precise enough to meet half-float
> +    * precision requirements. Alternatively, we could implement this using
> +    * the formula:
>

This isn't surprising.  It's possible we could restructure the
floating-point calculation to be more stable but just doing 32-bit seems
reasonable.

> +    *
> +    * asin(x) = atan2(x, sqrt(1 - x*x))
> +    *
> +    * But that is very expensive, so instead we just do the polynomial
> +    * approximation in 32-bit math and then we convert the result back to
> +    * 16-bit.
> +    */
> +   nir_ssa_def *x = _x->bit_size == 16 ? nir_f2f32(b, _x) : _x;
>

Mind restructuring this as follows?

if (x->bit_size == 16) {
   /* Comment goes here */
   return f2f16(b, build_asin(b, f2f32(b, x), p0, p1));
}

I find a bit of recursion easier to read than having two bits at the
beginning and end.

> +
>     nir_ssa_def *p0 = nir_imm_floatN_t(b, _p0, x->bit_size);
>     nir_ssa_def *p1 = nir_imm_floatN_t(b, _p1, x->bit_size);
>     nir_ssa_def *one = nir_imm_floatN_t(b, 1.0f, x->bit_size);
> @@ -210,7 +222,8 @@ build_asin(nir_builder *b, nir_ssa_def *x, float _p0,
> float _p1)
>     nir_ssa_def *m_pi_4_minus_one =
>        nir_imm_floatN_t(b, M_PI_4f - 1.0f, x->bit_size);
>     nir_ssa_def *abs_x = nir_fabs(b, x);
> -   return nir_fmul(b, nir_fsign(b, x),
> +   nir_ssa_def *result =
> +       nir_fmul(b, nir_fsign(b, x),
>                     nir_fsub(b, m_pi_2,
>                              nir_fmul(b, nir_fsqrt(b, nir_fsub(b, one,
> abs_x)),
>                                       nir_fadd(b, m_pi_2,
> @@ -220,6 +233,10 @@ build_asin(nir_builder *b, nir_ssa_def *x, float _p0,
> float _p1)
>
> nir_fadd(b, p0,
>
>          nir_fmul(b, abs_x,
>
>                   p1)))))))));
> +   if (_x->bit_size == 16)
> +      result = nir_f2f16(b, result);
> +
> +   return result;
>  }
>
>  /**
> --
> 2.17.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20181207/8592e88d/attachment.html>