[Mesa-dev] [PATCH 09/59] compiler/spirv: use 32-bit polynomial approximation for 16-bit asin()

Tue Dec 11 10:58:54 UTC 2018

On Fri, 2018-12-07 at 09:26 -0600, Jason Ekstrand wrote:
> On Tue, Dec 4, 2018 at 1:18 AM Iago Toral Quiroga <itoral at igalia.com>
> wrote:
> > The 16-bit polynomial execution doesn't meet Khronos precision
> > requirements.
> > 
> > Also, the half-float denorm range starts at 2^(-14) and with asin
> > taking input
> > 
> > values in the range [0, 1], polynomial approximations can lead to
> > flushing
> > 
> > relatively easy.
> > 
> > 
> > 
> > An alternative is to use the atan2 formula to compute asin, which
> > is the
> > 
> > reference taken by Khronos to determine precision requirements, but
> > that
> > 
> > ends up generating too many additional instructions when compared
> > to the
> > 
> > polynomial approximation. Specifically, for the Intel case, doing
> > this
> > 
> > adds +41 instructions to the program for each asin/acos call, which
> > looks
> > 
> > like an undesirable trade off.
> > 
> > 
> > 
> > So for now we take the easy way out and fallback to using the 32-
> > bit
> > 
> > polynomial approximation, which is better (faster) than the 16-bit
> > atan2
> > 
> > implementation and gives us better precision that matches Khronos
> > 
> > requirements.
> > 
> > ---
> > 
> >  src/compiler/spirv/vtn_glsl450.c | 21 +++++++++++++++++++--
> > 
> >  1 file changed, 19 insertions(+), 2 deletions(-)
> > 
> > 
> > 
> > diff --git a/src/compiler/spirv/vtn_glsl450.c
> > b/src/compiler/spirv/vtn_glsl450.c
> > 
> > index bb340c87416..64a1431ae14 100644
> > 
> > --- a/src/compiler/spirv/vtn_glsl450.c
> > 
> > +++ b/src/compiler/spirv/vtn_glsl450.c
> > 
> > @@ -201,8 +201,20 @@ build_log(nir_builder *b, nir_ssa_def *x)
> > 
> >   * in each case.
> > 
> >   */
> > 
> >  static nir_ssa_def *
> > 
> > -build_asin(nir_builder *b, nir_ssa_def *x, float _p0, float _p1)
> > 
> > +build_asin(nir_builder *b, nir_ssa_def *_x, float _p0, float _p1)
> > 
> >  {
> > 
> > +   /* The polynomial approximation isn't precise enough to meet
> > half-float
> > 
> > +    * precision requirements. Alternatively, we could implement
> > this using
> > 
> > +    * the formula:
> 
> This isn't surprising.  It's possible we could restructure the
> floating-point calculation to be more stable but just doing 32-bit
> seems reasonable.
>  
> > +    *
> > 
> > +    * asin(x) = atan2(x, sqrt(1 - x*x))
> > 
> > +    *
> > 
> > +    * But that is very expensive, so instead we just do the
> > polynomial
> > 
> > +    * approximation in 32-bit math and then we convert the result
> > back to
> > 
> > +    * 16-bit.
> > 
> > +    */
> > 
> > +   nir_ssa_def *x = _x->bit_size == 16 ? nir_f2f32(b, _x) : _x;
> 
> Mind restructuring this as follows?
> 
> if (x->bit_size == 16) {
>    /* Comment goes here */
>    return f2f16(b, build_asin(b, f2f32(b, x), p0, p1));
> }

Yes, this actually looks better to me as well.  Fixed locally. Thanks!
> I find a bit of recursion easier to read than having two bits at the
> beginning and end.
>  
> > +
> > 
> >     nir_ssa_def *p0 = nir_imm_floatN_t(b, _p0, x->bit_size);
> > 
> >     nir_ssa_def *p1 = nir_imm_floatN_t(b, _p1, x->bit_size);
> > 
> >     nir_ssa_def *one = nir_imm_floatN_t(b, 1.0f, x->bit_size);
> > 
> > @@ -210,7 +222,8 @@ build_asin(nir_builder *b, nir_ssa_def *x,
> > float _p0, float _p1)
> > 
> >     nir_ssa_def *m_pi_4_minus_one =
> > 
> >        nir_imm_floatN_t(b, M_PI_4f - 1.0f, x->bit_size);
> > 
> >     nir_ssa_def *abs_x = nir_fabs(b, x);
> > 
> > -   return nir_fmul(b, nir_fsign(b, x),
> > 
> > +   nir_ssa_def *result =
> > 
> > +       nir_fmul(b, nir_fsign(b, x),
> > 
> >                     nir_fsub(b, m_pi_2,
> > 
> >                              nir_fmul(b, nir_fsqrt(b, nir_fsub(b,
> > one, abs_x)),
> > 
> >                                       nir_fadd(b, m_pi_2,
> > 
> > @@ -220,6 +233,10 @@ build_asin(nir_builder *b, nir_ssa_def *x,
> > float _p0, float _p1)
> > 
> >                                                                    
> >       nir_fadd(b, p0,
> > 
> >                                                                    
> >                nir_fmul(b, abs_x,
> > 
> >                                                                    
> >                         p1)))))))));
> > 
> > +   if (_x->bit_size == 16)
> > 
> > +      result = nir_f2f16(b, result);
> > 
> > +
> > 
> > +   return result;
> > 
> >  }
> > 
> > 
> > 
> >  /**
> > 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20181211/cb90f961/attachment.html>