<html dir="ltr"><head></head><body style="text-align:left; direction:ltr;"><div>On Fri, 2018-12-07 at 09:26 -0600, Jason Ekstrand wrote:</div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Tue, Dec 4, 2018 at 1:18 AM Iago Toral Quiroga <<a href="mailto:itoral@igalia.com">itoral@igalia.com</a>> wrote:<br></div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex">The 16-bit polynomial execution doesn't meet Khronos precision requirements.<br> Also, the half-float denorm range starts at 2^(-14) and with asin taking input<br> values in the range [0, 1], polynomial approximations can lead to flushing<br> relatively easy.<br> <br> An alternative is to use the atan2 formula to compute asin, which is the<br> reference taken by Khronos to determine precision requirements, but that<br> ends up generating too many additional instructions when compared to the<br> polynomial approximation. Specifically, for the Intel case, doing this<br> adds +41 instructions to the program for each asin/acos call, which looks<br> like an undesirable trade off.<br> <br> So for now we take the easy way out and fallback to using the 32-bit<br> polynomial approximation, which is better (faster) than the 16-bit atan2<br> implementation and gives us better precision that matches Khronos<br> requirements.<br> ---<br> src/compiler/spirv/vtn_glsl450.c | 21 +++++++++++++++++++--<br> 1 file changed, 19 insertions(+), 2 deletions(-)<br> <br> diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c<br> index bb340c87416..64a1431ae14 100644<br> --- a/src/compiler/spirv/vtn_glsl450.c<br> +++ b/src/compiler/spirv/vtn_glsl450.c<br> @@ -201,8 +201,20 @@ build_log(nir_builder *b, nir_ssa_def *x)<br> * in each case.<br> */<br> static nir_ssa_def *<br> -build_asin(nir_builder *b, nir_ssa_def *x, float _p0, float _p1)<br> +build_asin(nir_builder *b, nir_ssa_def *_x, float _p0, float _p1)<br> {<br> + /* The polynomial approximation isn't precise enough to meet half-float<br> + * precision requirements. Alternatively, we could implement this using<br> + * the formula:<br></blockquote><div><br></div><div>This isn't surprising. It's possible we could restructure the floating-point calculation to be more stable but just doing 32-bit seems reasonable.<br></div><div> </div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"> + *<br> + * asin(x) = atan2(x, sqrt(1 - x*x))<br> + *<br> + * But that is very expensive, so instead we just do the polynomial<br> + * approximation in 32-bit math and then we convert the result back to<br> + * 16-bit.<br> + */<br> + nir_ssa_def *x = _x->bit_size == 16 ? nir_f2f32(b, _x) : _x;<br></blockquote><div><br></div><div>Mind restructuring this as follows?</div><div><br></div><div>if (x->bit_size == 16) {</div><div> /* Comment goes here */</div><div> return f2f16(b, build_asin(b, f2f32(b, x), p0, p1));<br></div><div>}</div></div></div></blockquote><div><br></div><div>Yes, this actually looks better to me as well. Fixed locally. Thanks!</div><div><br></div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>I find a bit of recursion easier to read than having two bits at the beginning and end.<br></div><div> </div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"> +<br> nir_ssa_def *p0 = nir_imm_floatN_t(b, _p0, x->bit_size);<br> nir_ssa_def *p1 = nir_imm_floatN_t(b, _p1, x->bit_size);<br> nir_ssa_def *one = nir_imm_floatN_t(b, 1.0f, x->bit_size);<br> @@ -210,7 +222,8 @@ build_asin(nir_builder *b, nir_ssa_def *x, float _p0, float _p1)<br> nir_ssa_def *m_pi_4_minus_one =<br> nir_imm_floatN_t(b, M_PI_4f - 1.0f, x->bit_size);<br> nir_ssa_def *abs_x = nir_fabs(b, x);<br> - return nir_fmul(b, nir_fsign(b, x),<br> + nir_ssa_def *result =<br> + nir_fmul(b, nir_fsign(b, x),<br> nir_fsub(b, m_pi_2,<br> nir_fmul(b, nir_fsqrt(b, nir_fsub(b, one, abs_x)),<br> nir_fadd(b, m_pi_2,<br> @@ -220,6 +233,10 @@ build_asin(nir_builder *b, nir_ssa_def *x, float _p0, float _p1)<br> nir_fadd(b, p0,<br> nir_fmul(b, abs_x,<br> p1)))))))));<br> + if (_x->bit_size == 16)<br> + result = nir_f2f16(b, result);<br> +<br> + return result;<br> }<br> <br> /**<br> </blockquote></div></div></blockquote></body></html>