[Mesa-dev] [PATCH 09/59] compiler/spirv: use 32-bit polynomial approximation for 16-bit asin()
Iago Toral
itoral at igalia.com
Tue Dec 11 10:58:54 UTC 2018
On Fri, 2018-12-07 at 09:26 -0600, Jason Ekstrand wrote:
> On Tue, Dec 4, 2018 at 1:18 AM Iago Toral Quiroga <itoral at igalia.com>
> wrote:
> > The 16-bit polynomial execution doesn't meet Khronos precision
> > requirements.
> >
> > Also, the half-float denorm range starts at 2^(-14) and with asin
> > taking input
> >
> > values in the range [0, 1], polynomial approximations can lead to
> > flushing
> >
> > relatively easy.
> >
> >
> >
> > An alternative is to use the atan2 formula to compute asin, which
> > is the
> >
> > reference taken by Khronos to determine precision requirements, but
> > that
> >
> > ends up generating too many additional instructions when compared
> > to the
> >
> > polynomial approximation. Specifically, for the Intel case, doing
> > this
> >
> > adds +41 instructions to the program for each asin/acos call, which
> > looks
> >
> > like an undesirable trade off.
> >
> >
> >
> > So for now we take the easy way out and fallback to using the 32-
> > bit
> >
> > polynomial approximation, which is better (faster) than the 16-bit
> > atan2
> >
> > implementation and gives us better precision that matches Khronos
> >
> > requirements.
> >
> > ---
> >
> > src/compiler/spirv/vtn_glsl450.c | 21 +++++++++++++++++++--
> >
> > 1 file changed, 19 insertions(+), 2 deletions(-)
> >
> >
> >
> > diff --git a/src/compiler/spirv/vtn_glsl450.c
> > b/src/compiler/spirv/vtn_glsl450.c
> >
> > index bb340c87416..64a1431ae14 100644
> >
> > --- a/src/compiler/spirv/vtn_glsl450.c
> >
> > +++ b/src/compiler/spirv/vtn_glsl450.c
> >
> > @@ -201,8 +201,20 @@ build_log(nir_builder *b, nir_ssa_def *x)
> >
> > * in each case.
> >
> > */
> >
> > static nir_ssa_def *
> >
> > -build_asin(nir_builder *b, nir_ssa_def *x, float _p0, float _p1)
> >
> > +build_asin(nir_builder *b, nir_ssa_def *_x, float _p0, float _p1)
> >
> > {
> >
> > + /* The polynomial approximation isn't precise enough to meet
> > half-float
> >
> > + * precision requirements. Alternatively, we could implement
> > this using
> >
> > + * the formula:
>
> This isn't surprising. It's possible we could restructure the
> floating-point calculation to be more stable but just doing 32-bit
> seems reasonable.
>
> > + *
> >
> > + * asin(x) = atan2(x, sqrt(1 - x*x))
> >
> > + *
> >
> > + * But that is very expensive, so instead we just do the
> > polynomial
> >
> > + * approximation in 32-bit math and then we convert the result
> > back to
> >
> > + * 16-bit.
> >
> > + */
> >
> > + nir_ssa_def *x = _x->bit_size == 16 ? nir_f2f32(b, _x) : _x;
>
> Mind restructuring this as follows?
>
> if (x->bit_size == 16) {
> /* Comment goes here */
> return f2f16(b, build_asin(b, f2f32(b, x), p0, p1));
> }
Yes, this actually looks better to me as well. Fixed locally. Thanks!
> I find a bit of recursion easier to read than having two bits at the
> beginning and end.
>
> > +
> >
> > nir_ssa_def *p0 = nir_imm_floatN_t(b, _p0, x->bit_size);
> >
> > nir_ssa_def *p1 = nir_imm_floatN_t(b, _p1, x->bit_size);
> >
> > nir_ssa_def *one = nir_imm_floatN_t(b, 1.0f, x->bit_size);
> >
> > @@ -210,7 +222,8 @@ build_asin(nir_builder *b, nir_ssa_def *x,
> > float _p0, float _p1)
> >
> > nir_ssa_def *m_pi_4_minus_one =
> >
> > nir_imm_floatN_t(b, M_PI_4f - 1.0f, x->bit_size);
> >
> > nir_ssa_def *abs_x = nir_fabs(b, x);
> >
> > - return nir_fmul(b, nir_fsign(b, x),
> >
> > + nir_ssa_def *result =
> >
> > + nir_fmul(b, nir_fsign(b, x),
> >
> > nir_fsub(b, m_pi_2,
> >
> > nir_fmul(b, nir_fsqrt(b, nir_fsub(b,
> > one, abs_x)),
> >
> > nir_fadd(b, m_pi_2,
> >
> > @@ -220,6 +233,10 @@ build_asin(nir_builder *b, nir_ssa_def *x,
> > float _p0, float _p1)
> >
> >
> > nir_fadd(b, p0,
> >
> >
> > nir_fmul(b, abs_x,
> >
> >
> > p1)))))))));
> >
> > + if (_x->bit_size == 16)
> >
> > + result = nir_f2f16(b, result);
> >
> > +
> >
> > + return result;
> >
> > }
> >
> >
> >
> > /**
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20181211/cb90f961/attachment.html>
More information about the mesa-dev
mailing list