[Beignet] [PATCH 2/4] Backend: Optimization internal math, lower polynomials

Sun Jul 24 17:20:57 UTC 2016

I acknowledge the problems with lgamma/lgamma_r. It is indeed because of the polynomial reduction - while conformance passes the utests fail. 
I wasn't able to keep the lgamma/lgamma_r polynomial reduction and still pass utest. It looks like it is very sensitive and precision is at its limit with utest at default.

Since mad optimization are impacted by polynomial grade and most changes in math library are human error prone (I did several) and can be automated I have designed in Python a script to parse the math function and transform a + b * c expressions into mad(b, c, a) recursively. This should limit errors and allow adjustment of polynomial reduction easy.

I will post the new clean patchset + script (on mailing list) as soon as I finally tune polynomial reduction and mad so that all utest and conformance pass. With the automated script mad should be applied everywhere and performance should improve further.

-----Original Message-----
From: Song, Ruiling 
Sent: Friday, July 22, 2016 11:31 AM
To: Lupescu, Grigore <grigore.lupescu at intel.com>; beignet at lists.freedesktop.org
Subject: RE: [Beignet] [PATCH 2/4] Backend: Optimization internal math, lower polynomials

Hi Grigore,

After applying the patchset. Looks like some utest failed.
These failure relate to gamma functions.
Ocl spec requires tgamma() should be < 16ulp.
Although ocl spec does not give out ulp for lgamma and lgamma_r.
I think if we can make it under 16ulp. That is acceptable.
Please help to take a look. It may relate to decreasing the polynomial grade.
The utest may be also not very correct, you can fix them.

builtin_lgamma_float()    [FAILED]
    Error: input_data1:3.140000e+00  -> gpu:8.260892e-01  cpu:8.261388e-01 diff:4.959106e-05 expect:2.384186e-07

  at file /home/ruilings/workspace/beignet/utests/generated/builtin_lgamma_float.cpp, function builtin_lgamma_float, line 123
builtin_lgamma_float2()    [FAILED]
    Error: input_data1:3.140000e+00  -> gpu:8.260892e-01  cpu:8.261388e-01 diff:4.959106e-05 expect:2.384186e-07

  at file /home/ruilings/workspace/beignet/utests/generated/builtin_lgamma_float2.cpp, function builtin_lgamma_float2, line 123
builtin_lgamma_float4()    [FAILED]
    Error: input_data1:3.140000e+00  -> gpu:8.260892e-01  cpu:8.261388e-01 diff:4.959106e-05 expect:2.384186e-07

  at file /home/ruilings/workspace/beignet/utests/generated/builtin_lgamma_float4.cpp, function builtin_lgamma_float4, line 123
builtin_lgamma_float8()    [FAILED]
    Error: input_data1:3.140000e+00  -> gpu:8.260892e-01  cpu:8.261388e-01 diff:4.959106e-05 expect:2.384186e-07

  at file /home/ruilings/workspace/beignet/utests/generated/builtin_lgamma_float8.cpp, function builtin_lgamma_float8, line 123
builtin_lgamma_float16()    [FAILED]
    Error: input_data1:3.140000e+00  -> gpu:8.260892e-01  cpu:8.261388e-01 diff:4.959106e-05 expect:2.384186e-07

  at file /home/ruilings/workspace/beignet/utests/generated/builtin_lgamma_float16.cpp, function builtin_lgamma_float16, line 123
builtin_lgamma()0.094000 2.317156 2.316127
    [FAILED]
    Error: 0
  at file /home/ruilings/workspace/beignet/utests/builtin_lgamma.cpp, function builtin_lgamma, line 33
builtin_lgamma_r()0.094000 2.317156 2.316127
    [FAILED]
    Error: 0
  at file /home/ruilings/workspace/beignet/utests/builtin_lgamma_r.cpp, function builtin_lgamma_r, line 38
builtin_tgamma()-3.820000 0.319208 0.319208
    [FAILED]
    Error: 0
  at file /home/ruilings/workspace/beignet/utests/builtin_tgamma.cpp, function builtin_tgamma, line 50

Thanks!
Ruiling

> -----Original Message-----
> From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf 
> Of Grigore Lupescu
> Sent: Tuesday, June 28, 2016 3:04 AM
> To: beignet at lists.freedesktop.org
> Subject: [Beignet] [PATCH 2/4] Backend: Optimization internal math, 
> lower polynomials
> 
> From: Grigore Lupescu <grigore.lupescu at intel.com>
> 
> Use lower grade polynomials for approximations, keep conformance passing.
> 
> LOG	Use polynomial grade 4 (was 7)
> LOG2	Use polynomial grade 4 (was 7)
> SIN	Use polynomial grade 4 (was 6)
> COS	Use polynomial grade 3 (was 6)
> TANF	Use polynomial grade 7 (was 12)
> GAMMA	Use polynomial grade 3 (was 12)
> GAMMA_R Use polynomial grade 3 (was 12)
> LOG1P	Use polynomial grade 4 (was 7)
> ASIN	Use polynomial grade 4 (was 5)
> ATAN	Use polynomial grade 6 (was 10)
> EXP	Use polynomial grade 2 (was 5)
> EXPM1	Use polynomial grade 3 (was 5)
> POW	Use polynomial grade 2 (was 6)
> POWN	Use polynomial grade 2 (was 6)
> 
> Signed-off-by: Grigore Lupescu <grigore.lupescu at intel.com>
> ---