<html><head></head><body><div>On Mon, 2017-01-16 at 10:21 -0800, Jason Ekstrand wrote:</div><blockquote type="cite"><div dir="ltr"><div><div>+curro<br><br></div>I'm not sure what I think here. TBH, I haven't actually read it in detail yet, but here are some first impressions:<br></div><div> 1) There are two implementations of atan2 (SPIR-V and GLSL) and they should be kept in sync. The same dEQP tests exist in both cases.<br></div></div></blockquote><div><br></div><div>Yes. I focused in Vulkan, but I agree these changes (if accepted) could be ported to the GLSL version.</div><div><br></div><blockquote type="cite"><div dir="ltr"><div> 2) The atan2 tests are not in any mustpass list and I doubt we'll see them show up soon (I could be wrong on that one) so I'm not sure fixing them is all that high priority.<br></div></div></blockquote><div><br></div><div>Agree. There were removed from mustpass precisely due the corner cases it has.</div><div><br></div><blockquote type="cite"><div dir="ltr"><div> 3) There's a good reason why they aren't on the mustpass: atan2 has lots of hard-to-get-right corners and, unless you have a very good hardware atan2 instruction, any "correct" implementation is liable to emit a *lot* of shader code. A quick grep through shader-db for atan yields a number of shaders that use the atan2 variant so this may have a minor perf impact.<br><br></div></div></blockquote><div><br></div><div>This was the motivation to submit a change here: current atan2 implementation, already emit a lot of code. So adding some instructions more I think won't make a big change here. And atan2 doesn't seem a very used function. In shader-db (including private shaders), I only found 36 shaders using it, out of of 26k.</div><div><br></div><div><br></div><blockquote type="cite"><div dir="ltr"><div>The above does not necessarily sum to "we shouldn't fix it" but it probably does mean it's low-priority at best and we need to be careful.<br><br></div><div>Looking a bit into the math of atan2, I'm not convinced flushing denorms to zero is actually what we want to do. For x, I think it's valid, but as y approaches 0, you get +/- pi/2 depending on whether y positive or negative. As both approach zero, it's undefined. As you approach infinity, it approaches something in the interval [-pi/2, pi/2] but it depends on the direction of approach. If you do "man atan2" you'll see this all layed out in a horrifying amount of detail (somewhere around a dozen different cases).<br></div></div><div class="gmail_extra"><br></div></blockquote><div><br></div><div>Yes, originally I only flushed 'x'. But then also flushed 'y' to keep coherence. But didn't think about that corner case you mention. I think we can remove the flush to 0 in 'y'.</div><div><br></div><div> </div><blockquote type="cite"><div class="gmail_extra"><div class="gmail_quote">On Mon, Jan 16, 2017 at 4:08 AM, Juan A. Suarez Romero <span dir="ltr"><<a href="mailto:jasuarez@igalia.com" target="_blank">jasuarez@igalia.com</a>></span> wrote:<br><blockquote type="cite">Rewrite atan2(y,x) to cover (+/-)INF values.<br>
<br>
Also, in case either 'y' or 'x' is a denorm value, flush it to 0 at the<br>
very beginning.<br>
<br>
The reason is that in other case, the hardware will do the flush in some<br>
of the steps, but not in order. So we end up handling in some steps a<br>
denorm value and in others a 0. This causes wrong results.<br>
<br>
Doing it at the very beginning we ensure always the same value is used<br>
(a 0) in all the steps.<br>
<br>
This fixes several test cases in Vulkan CTS<br>
(dEQP-VK.glsl.builtin.precision.atan2.*)<br>
---<br>
src/compiler/spirv/vtn_glsl450.c | 68 ++++++++++++++++++++++++++++++++++------<br>
1 file changed, 58 insertions(+), 10 deletions(-)<br>
<br>
diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c<br>
index 0d32fdd..d2743a8 100644<br>
--- a/src/compiler/spirv/vtn_glsl450.c<br>
+++ b/src/compiler/spirv/vtn_glsl450.c<br>
@@ -299,31 +299,79 @@ build_atan(nir_builder *b, nir_ssa_def *y_over_x)<br>
return nir_fmul(b, tmp, nir_fsign(b, y_over_x));<br>
}<br>
<br>
+/*<br>
+ * Computes atan2(y,x)<br>
+ *<br>
+ * If any of the parameters is a denorm value, it is flushed to 0 at the very<br>
+ * beginning to avoid precision errors<br>
+ */<br>
static nir_ssa_def *<br>
build_atan2(nir_builder *b, nir_ssa_def *y, nir_ssa_def *x)<br>
{<br>
nir_ssa_def *zero = nir_imm_float(b, 0.0f);<br>
-<br>
- /* If |x| >= 1.0e-8 * |y|: */<br>
+ nir_ssa_def *inf = nir_imm_float(b, INFINITY);<br>
+ nir_ssa_def *minus_inf = nir_imm_float(b, -INFINITY);<br>
+ nir_ssa_def *m_3_pi_4 = nir_fmul(b, nir_imm_float(b, 3.0f),<br>
+ nir_imm_float(b, M_PI_4f));<br>
+<br>
+ nir_ssa_def *denorm_y = nir_bcsel(b, nir_feq(b, nir_fmov(b, nir_fabs(b, y)),<br>
+ zero),<br>
+ zero,<br>
+ y);<br>
+ nir_ssa_def *denorm_x = nir_bcsel(b, nir_feq(b, nir_fmov(b, nir_fabs(b, x)),<br>
+ zero),<br>
+ zero,<br>
+ x);<br>
+<br>
+ /* if y == +-INF */<br>
+ nir_ssa_def *y_is_inf = nir_feq(b, nir_fabs(b, y), inf);<br>
+<br>
+ /* if x == +-INF */<br>
+ nir_ssa_def *x_is_inf = nir_feq(b, nir_fabs(b, x), inf);<br>
+<br>
+ /* Case: y is +-INF */<br>
+ nir_ssa_def *y_is_inf_then =<br>
+ nir_fmul(b, nir_fsign(b, y),<br>
+ nir_bcsel(b, nir_feq(b, x, inf),<br>
+ nir_imm_float(b, M_PI_4f),<br>
+ nir_bcsel(b, nir_feq(b, x, minus_inf),<br>
+ m_3_pi_4,<br>
+ nir_imm_float(b, M_PI_2f))));<br>
+<br>
+ /* Case: x is +-INF */<br>
+ nir_ssa_def *x_is_inf_then =<br>
+ nir_fmul(b, nir_fsign(b, y),<br>
+ nir_bcsel(b, nir_feq(b, x, inf),<br>
+ zero,<br>
+ nir_imm_float(b, M_PIf)));<br>
+<br>
+ /* If x > 0 */<br>
nir_ssa_def *condition =<br>
- nir_fge(b, nir_fabs(b, x),<br>
- nir_fmul(b, nir_imm_float(b, 1.0e-8f), nir_fabs(b, y)));<br>
+ nir_fne(b, denorm_x, zero);<br>
<br>
/* Then...call atan(y/x) and fix it up: */<br>
- nir_ssa_def *atan1 = build_atan(b, nir_fdiv(b, y, x));<br>
+ nir_ssa_def *atan1 = build_atan(b, nir_fdiv(b, denorm_y, denorm_x));<br>
+<br>
nir_ssa_def *r_then =<br>
- nir_bcsel(b, nir_flt(b, x, zero),<br>
+ nir_bcsel(b, nir_flt(b, denorm_x, zero),<br>
nir_fadd(b, atan1,<br>
- nir_bcsel(b, nir_fge(b, y, zero),<br>
+ nir_bcsel(b, nir_fge(b, denorm_y, zero),<br>
nir_imm_float(b, M_PIf),<br>
nir_imm_float(b, -M_PIf))),<br>
atan1);<br>
<br>
/* Else... */<br>
nir_ssa_def *r_else =<br>
- nir_fmul(b, nir_fsign(b, y), nir_imm_float(b, M_PI_2f));<br>
-<br>
- return nir_bcsel(b, condition, r_then, r_else);<br>
+ nir_fmul(b, nir_fsign(b, denorm_y), nir_imm_float(b, M_PI_2f));<br>
+<br>
+ /* Everything together */<br>
+ return nir_bcsel(b, y_is_inf,<br>
+ y_is_inf_then,<br>
+ nir_bcsel(b, x_is_inf,<br>
+ x_is_inf_then,<br>
+ nir_bcsel(b, condition,<br>
+ r_then,<br>
+ r_else)));<br>
}<br>
<br>
static nir_ssa_def *<br>
<span class="HOEnZb"><font color="#888888">--<br>
2.9.3<br>
<br>
______________________________<wbr>_________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</font></span><br></blockquote></div><br></div>
</blockquote></body></html>