[Mesa-dev] [PATCH 09/59] compiler/spirv: use 32-bit polynomial approximation for 16-bit asin()
Jason Ekstrand
jason at jlekstrand.net
Fri Dec 7 15:26:14 UTC 2018
On Tue, Dec 4, 2018 at 1:18 AM Iago Toral Quiroga <itoral at igalia.com> wrote:
> The 16-bit polynomial execution doesn't meet Khronos precision
> requirements.
> Also, the half-float denorm range starts at 2^(-14) and with asin taking
> input
> values in the range [0, 1], polynomial approximations can lead to flushing
> relatively easy.
>
> An alternative is to use the atan2 formula to compute asin, which is the
> reference taken by Khronos to determine precision requirements, but that
> ends up generating too many additional instructions when compared to the
> polynomial approximation. Specifically, for the Intel case, doing this
> adds +41 instructions to the program for each asin/acos call, which looks
> like an undesirable trade off.
>
> So for now we take the easy way out and fallback to using the 32-bit
> polynomial approximation, which is better (faster) than the 16-bit atan2
> implementation and gives us better precision that matches Khronos
> requirements.
> ---
> src/compiler/spirv/vtn_glsl450.c | 21 +++++++++++++++++++--
> 1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/src/compiler/spirv/vtn_glsl450.c
> b/src/compiler/spirv/vtn_glsl450.c
> index bb340c87416..64a1431ae14 100644
> --- a/src/compiler/spirv/vtn_glsl450.c
> +++ b/src/compiler/spirv/vtn_glsl450.c
> @@ -201,8 +201,20 @@ build_log(nir_builder *b, nir_ssa_def *x)
> * in each case.
> */
> static nir_ssa_def *
> -build_asin(nir_builder *b, nir_ssa_def *x, float _p0, float _p1)
> +build_asin(nir_builder *b, nir_ssa_def *_x, float _p0, float _p1)
> {
> + /* The polynomial approximation isn't precise enough to meet half-float
> + * precision requirements. Alternatively, we could implement this using
> + * the formula:
>
This isn't surprising. It's possible we could restructure the
floating-point calculation to be more stable but just doing 32-bit seems
reasonable.
> + *
> + * asin(x) = atan2(x, sqrt(1 - x*x))
> + *
> + * But that is very expensive, so instead we just do the polynomial
> + * approximation in 32-bit math and then we convert the result back to
> + * 16-bit.
> + */
> + nir_ssa_def *x = _x->bit_size == 16 ? nir_f2f32(b, _x) : _x;
>
Mind restructuring this as follows?
if (x->bit_size == 16) {
/* Comment goes here */
return f2f16(b, build_asin(b, f2f32(b, x), p0, p1));
}
I find a bit of recursion easier to read than having two bits at the
beginning and end.
> +
> nir_ssa_def *p0 = nir_imm_floatN_t(b, _p0, x->bit_size);
> nir_ssa_def *p1 = nir_imm_floatN_t(b, _p1, x->bit_size);
> nir_ssa_def *one = nir_imm_floatN_t(b, 1.0f, x->bit_size);
> @@ -210,7 +222,8 @@ build_asin(nir_builder *b, nir_ssa_def *x, float _p0,
> float _p1)
> nir_ssa_def *m_pi_4_minus_one =
> nir_imm_floatN_t(b, M_PI_4f - 1.0f, x->bit_size);
> nir_ssa_def *abs_x = nir_fabs(b, x);
> - return nir_fmul(b, nir_fsign(b, x),
> + nir_ssa_def *result =
> + nir_fmul(b, nir_fsign(b, x),
> nir_fsub(b, m_pi_2,
> nir_fmul(b, nir_fsqrt(b, nir_fsub(b, one,
> abs_x)),
> nir_fadd(b, m_pi_2,
> @@ -220,6 +233,10 @@ build_asin(nir_builder *b, nir_ssa_def *x, float _p0,
> float _p1)
>
> nir_fadd(b, p0,
>
> nir_fmul(b, abs_x,
>
> p1)))))))));
> + if (_x->bit_size == 16)
> + result = nir_f2f16(b, result);
> +
> + return result;
> }
>
> /**
> --
> 2.17.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20181207/8592e88d/attachment.html>
More information about the mesa-dev
mailing list