<html><head></head><body><div>On Mon, 2017-01-16 at 10:21 -0800, Jason Ekstrand wrote:</div><blockquote type="cite"><div dir="ltr"><div><div>+curro<br><br></div>I'm not sure what I think here. TBH, I haven't actually read it in detail yet, but here are some first impressions:<br></div><div> 1) There are two implementations of atan2 (SPIR-V and GLSL) and they should be kept in sync. The same dEQP tests exist in both cases.<br></div></div></blockquote><div><br></div><div>Yes. I focused in Vulkan, but I agree these changes (if accepted) could be ported to the GLSL version.</div><div><br></div><blockquote type="cite"><div dir="ltr"><div> 2) The atan2 tests are not in any mustpass list and I doubt we'll see them show up soon (I could be wrong on that one) so I'm not sure fixing them is all that high priority.<br></div></div></blockquote><div><br></div><div>Agree. There were removed from mustpass precisely due the corner cases it has.</div><div><br></div><blockquote type="cite"><div dir="ltr"><div> 3) There's a good reason why they aren't on the mustpass: atan2 has lots of hard-to-get-right corners and, unless you have a very good hardware atan2 instruction, any "correct" implementation is liable to emit a *lot* of shader code. A quick grep through shader-db for atan yields a number of shaders that use the atan2 variant so this may have a minor perf impact.<br><br></div></div></blockquote><div><br></div><div>This was the motivation to submit a change here: current atan2 implementation, already emit a lot of code. So adding some instructions more I think won't make a big change here. And atan2 doesn't seem a very used function. In shader-db (including private shaders), I only found 36 shaders using it, out of of 26k.</div><div><br></div><div><br></div><blockquote type="cite"><div dir="ltr"><div>The above does not necessarily sum to "we shouldn't fix it" but it probably does mean it's low-priority at best and we need to be careful.<br><br></div><div>Looking a bit into the math of atan2, I'm not convinced flushing denorms to zero is actually what we want to do. For x, I think it's valid, but as y approaches 0, you get +/- pi/2 depending on whether y positive or negative. As both approach zero, it's undefined. As you approach infinity, it approaches something in the interval [-pi/2, pi/2] but it depends on the direction of approach. If you do "man atan2" you'll see this all layed out in a horrifying amount of detail (somewhere around a dozen different cases).<br></div></div><div class="gmail_extra"><br></div></blockquote><div><br></div><div>Yes, originally I only flushed 'x'. But then also flushed 'y' to keep coherence. But didn't think about that corner case you mention. I think we can remove the flush to 0 in 'y'.</div><div><br></div><div> </div><blockquote type="cite"><div class="gmail_extra"><div class="gmail_quote">On Mon, Jan 16, 2017 at 4:08 AM, Juan A. Suarez Romero <span dir="ltr"><<a href="mailto:jasuarez@igalia.com" target="_blank">jasuarez@igalia.com</a>></span> wrote:<br><blockquote type="cite">Rewrite atan2(y,x) to cover (+/-)INF values.<br> <br> Also, in case either 'y' or 'x' is a denorm value, flush it to 0 at the<br> very beginning.<br> <br> The reason is that in other case, the hardware will do the flush in some<br> of the steps, but not in order. So we end up handling in some steps a<br> denorm value and in others a 0. This causes wrong results.<br> <br> Doing it at the very beginning we ensure always the same value is used<br> (a 0) in all the steps.<br> <br> This fixes several test cases in Vulkan CTS<br> (dEQP-VK.glsl.builtin.precision.atan2.*)<br> ---<br> src/compiler/spirv/vtn_glsl450.c | 68 ++++++++++++++++++++++++++++++++++------<br> 1 file changed, 58 insertions(+), 10 deletions(-)<br> <br> diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c<br> index 0d32fdd..d2743a8 100644<br> --- a/src/compiler/spirv/vtn_glsl450.c<br> +++ b/src/compiler/spirv/vtn_glsl450.c<br> @@ -299,31 +299,79 @@ build_atan(nir_builder *b, nir_ssa_def *y_over_x)<br> return nir_fmul(b, tmp, nir_fsign(b, y_over_x));<br> }<br> <br> +/*<br> + * Computes atan2(y,x)<br> + *<br> + * If any of the parameters is a denorm value, it is flushed to 0 at the very<br> + * beginning to avoid precision errors<br> + */<br> static nir_ssa_def *<br> build_atan2(nir_builder *b, nir_ssa_def *y, nir_ssa_def *x)<br> {<br> nir_ssa_def *zero = nir_imm_float(b, 0.0f);<br> -<br> - /* If |x| >= 1.0e-8 * |y|: */<br> + nir_ssa_def *inf = nir_imm_float(b, INFINITY);<br> + nir_ssa_def *minus_inf = nir_imm_float(b, -INFINITY);<br> + nir_ssa_def *m_3_pi_4 = nir_fmul(b, nir_imm_float(b, 3.0f),<br> + nir_imm_float(b, M_PI_4f));<br> +<br> + nir_ssa_def *denorm_y = nir_bcsel(b, nir_feq(b, nir_fmov(b, nir_fabs(b, y)),<br> + zero),<br> + zero,<br> + y);<br> + nir_ssa_def *denorm_x = nir_bcsel(b, nir_feq(b, nir_fmov(b, nir_fabs(b, x)),<br> + zero),<br> + zero,<br> + x);<br> +<br> + /* if y == +-INF */<br> + nir_ssa_def *y_is_inf = nir_feq(b, nir_fabs(b, y), inf);<br> +<br> + /* if x == +-INF */<br> + nir_ssa_def *x_is_inf = nir_feq(b, nir_fabs(b, x), inf);<br> +<br> + /* Case: y is +-INF */<br> + nir_ssa_def *y_is_inf_then =<br> + nir_fmul(b, nir_fsign(b, y),<br> + nir_bcsel(b, nir_feq(b, x, inf),<br> + nir_imm_float(b, M_PI_4f),<br> + nir_bcsel(b, nir_feq(b, x, minus_inf),<br> + m_3_pi_4,<br> + nir_imm_float(b, M_PI_2f))));<br> +<br> + /* Case: x is +-INF */<br> + nir_ssa_def *x_is_inf_then =<br> + nir_fmul(b, nir_fsign(b, y),<br> + nir_bcsel(b, nir_feq(b, x, inf),<br> + zero,<br> + nir_imm_float(b, M_PIf)));<br> +<br> + /* If x > 0 */<br> nir_ssa_def *condition =<br> - nir_fge(b, nir_fabs(b, x),<br> - nir_fmul(b, nir_imm_float(b, 1.0e-8f), nir_fabs(b, y)));<br> + nir_fne(b, denorm_x, zero);<br> <br> /* Then...call atan(y/x) and fix it up: */<br> - nir_ssa_def *atan1 = build_atan(b, nir_fdiv(b, y, x));<br> + nir_ssa_def *atan1 = build_atan(b, nir_fdiv(b, denorm_y, denorm_x));<br> +<br> nir_ssa_def *r_then =<br> - nir_bcsel(b, nir_flt(b, x, zero),<br> + nir_bcsel(b, nir_flt(b, denorm_x, zero),<br> nir_fadd(b, atan1,<br> - nir_bcsel(b, nir_fge(b, y, zero),<br> + nir_bcsel(b, nir_fge(b, denorm_y, zero),<br> nir_imm_float(b, M_PIf),<br> nir_imm_float(b, -M_PIf))),<br> atan1);<br> <br> /* Else... */<br> nir_ssa_def *r_else =<br> - nir_fmul(b, nir_fsign(b, y), nir_imm_float(b, M_PI_2f));<br> -<br> - return nir_bcsel(b, condition, r_then, r_else);<br> + nir_fmul(b, nir_fsign(b, denorm_y), nir_imm_float(b, M_PI_2f));<br> +<br> + /* Everything together */<br> + return nir_bcsel(b, y_is_inf,<br> + y_is_inf_then,<br> + nir_bcsel(b, x_is_inf,<br> + x_is_inf_then,<br> + nir_bcsel(b, condition,<br> + r_then,<br> + r_else)));<br> }<br> <br> static nir_ssa_def *<br> <span class="HOEnZb"><font color="#888888">--<br> 2.9.3<br> <br> ______________________________<wbr>_________________<br> mesa-dev mailing list<br> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br> </font></span><br></blockquote></div><br></div> </blockquote></body></html>