[Mesa-dev] [PATCH 1/2] glsl: Use a simpler formula for tanh

Fri Dec 9 18:20:00 UTC 2016

Unsurprisingly, the formula looks great to me :-).
I was actually wondering about accuracy. I believe the biggest issue
(both with the original formula and this one) is probably values around
zero - because that gets calculated as (~1 - 1) / 2 - so the closest
values to zero you can get (other than zero) are ~2^-25 (whereas an
exact calculation could go down to 2^-127). So maybe the simplified
formula might actually be even a bit better there? glsl seems to be
quite lenient with required exp precision.

In any case,
Reviewed-by: Roland Scheidegger <sroland at vmware.com>

Am 09.12.2016 um 18:41 schrieb Jason Ekstrand:
> The formula we have used in the past is a trivial reduction from the
> definition by simply multiplying both the numerator and denominator of the
> formula by 2.  However, multiplying by e^x, you can further reduce it.
> This allows us to get rid of one side of the clamp and two of exponential
> functions which should make it faster.  The new formula still passes the
> dEQP precision tests for tanh so it should be fine.
> ---
>  src/compiler/glsl/builtin_functions.cpp | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/src/compiler/glsl/builtin_functions.cpp b/src/compiler/glsl/builtin_functions.cpp
> index 3dead1a..94e8279 100644
> --- a/src/compiler/glsl/builtin_functions.cpp
> +++ b/src/compiler/glsl/builtin_functions.cpp
> @@ -3563,17 +3563,19 @@ builtin_builder::_tanh(const glsl_type *type)
>     ir_variable *x = in_var(type, "x");
>     MAKE_SIG(type, v130, 1, x);
>  
> -   /* Clamp x to [-10, +10] to avoid precision problems.
> -    * When x > 10, e^(-x) is so small relative to e^x that it gets flushed to
> -    * zero in the computation e^x + e^(-x). The same happens in the other
> -    * direction when x < -10.
> +   /* tanh(x) := (0.5 * (e^x - e^(-x))) / (0.5 * (e^x + e^(-x)))
> +    *
> +    * With a little algebra this reduces to (e^2x - 1) / (e^2x + 1)
> +    *
> +    * Clamp x to (-inf, +10] to avoid precision problems.  When x > 10, e^x is
> +    * so much larger than 1.0 that 1.0 gets flushed to zero in the computation
> +    * e^x +- 1 so it can be ignored.
>      */
>     ir_variable *t = body.make_temp(type, "tmp");
> -   body.emit(assign(t, min2(max2(x, imm(-10.0f)), imm(10.0f))));
> +   body.emit(assign(t, min2(x, imm(10.0f))));
>  
> -   /* (e^x - e^(-x)) / (e^x + e^(-x)) */
> -   body.emit(ret(div(sub(exp(t), exp(neg(t))),
> -                     add(exp(t), exp(neg(t))))));
> +   body.emit(ret(div(sub(exp(mul(t, imm(2.0f))), imm(1.0f)),
> +                     add(exp(mul(t, imm(2.0f))), imm(1.0f)))));
>  
>     return sig;
>  }
>