[Mesa-dev] [PATCH v3] compiler/glsl: fix precision problem of tanh

Wed Dec 21 01:03:13 UTC 2016

Am 20.12.2016 um 22:12 schrieb Giuseppe Bilotta:
> On Tue, Dec 20, 2016 at 2:17 AM, Matt Turner <mattst88 at gmail.com> wrote:
>> On Mon, Dec 19, 2016 at 5:12 PM, Giuseppe Bilotta
>> <giuseppe.bilotta at gmail.com> wrote:
>>> Just one question though —not knowing much of the shader language, can
>>> I expect expm1 to be available?
>>
>> No, expm1 doesn't exist in GLSL.
> 
> This is extremely bothersome. Both the (exp(2x)-1)/(exp(2x)+1) and the
> 1-2/(exp(2x)+1) formulas give pretty good results when written
> in terms of expm1.
> 
> On Tue, Dec 20, 2016 at 3:48 AM, Roland Scheidegger <sroland at vmware.com> wrote:
>> Not sure it really matters though one way or another. If you wanted good
>> accuracy around 0, you'd have to use a different formula plus a select
>> (seems like libm implementations actually use 3 cases depending on input
>> value magnitude - not so hot with vectors, but thankfully glsl doesn't
>> require 1 ULP accuracy).
> 
> Brute-forcing over all floating points on CPU by switching between the
> two formulas above at appropriate thresholds gives a maximum relative
> error of the order of machine epsilon when using expm1, and the switch
> between the two formulas can be implemented with a select on two
> terms. However, this does require expm1.
> 
> Nelson Beebe has a very detailed description of how to achieve very
> accurate results for tanh here
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.math.utah.edu_-7Ebeebe_software_ieee_tanh.pdf&d=DgIFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=-8RA3Y0TZk5KhOV7i-V1QiCKZ2b1Xd7ubIOObRsSajM&s=LQvjfGSg0bmKWKjl7W2DlL0vE-Xw2XJoJCx6He20Bcs&e=  and the
> results are a bit depressing, in that multiple thresholds are
> necessary. I'm not sure if these are the same used by libm, but in any
> case neither lends itself well to vectorization (in contrast to the
> switch between the two formulas above).
> 
> An alternative approach could be to actually provide a software
> implementation of expm1 and use it to compute tanh. I wouldn't be
> surprised if this would turn out to not be slower than using exp
> itself, in fact.
> 

I'd venture a guess, you cannot beat the exp of the gpus (exp2 actually,
but it doesn't matter). Those are built to be fast (and not necessarily
100% exact). Ok maybe for some intel chips which use the famous mathbox
maybe you could be competitive...
Now for something like llvmpipe, you could be right. I have no idea if
exp or expm1 is more difficult to evaluate. But noone is going to bother
for that case. For an opcode we don't even have any evidence it's
actually even used somewhere (outside conformance tests). Well it
probably is somewhere, but it's probably rare enough it's not exactly an
interesting target for optimization.
So, I guess unless more accuracy around 0 is really needed, there's
really not much point investing time in this.

Roland