[Beignet] [PATCH 3/3] Backend: Optimization of internal math functions

Lupescu, Grigore grigore.lupescu at intel.com
Wed May 25 19:12:24 UTC 2016


Reposted patchset - used native to partially solve divergence in NaN/INF on logarithm. Fdlib tests for NaN/INF unsing nested ifs.
It's hard though to evaluate performance since most tests don't fully benchmark small sets of math functions in an intensive matter. On the posted initial benchmarks performance improved with about 20% by using native. Maybe some focus should come to redefine/group tests that are heavily impacted by math ?
Likewise some tests (like Luxmark) only use native math and there is still some difference over VPG. Maybe look at scheduling (gpu walker) ?

-----Original Message-----
From: Song, Ruiling 
Sent: Thursday, May 12, 2016 11:14 AM
To: Lupescu, Grigore <grigore.lupescu at intel.com>; beignet at lists.freedesktop.org
Subject: RE: [Beignet] [PATCH 3/3] Backend: Optimization of internal math functions

> 
>      /* sin(Inf or NaN) is NaN */
> -  if (ix>=0x7f800000) return x-x;
> +  if (ix >= 0x7f800000) return x-x;
> 
> -    /* argument reduction needed */
> +  if(x <= pio4)
> +	  return negative * __kernel_sinf(x);
> +  /* argument reduction needed */
I think it is better we remove this (x < pio4) branch. 
Let's keep the implementation less divergent. What do you think?

>    else {
>        n = __ieee754_rem_pio2f(x,&y);
>        float s = __kernel_sinf(y);
> @@ -612,10 +605,12 @@ OVERLOADABLE float sin(float x) {
>    }
>  }
> 
> -OVERLOADABLE float cos(float x) {
> +OVERLOADABLE float cos(float x)
> +{
>    if (__ocl_math_fastpath_flag)
>      return __gen_ocl_internal_fastpath_cos(x);
> 
> +  const float pio4  =  7.8539812565e-01; /* 0x3f490fda */
>    float y,z=0.0;
>    int n, ix;
>    x = __gen_ocl_fabs(x);
> @@ -624,9 +619,11 @@ OVERLOADABLE float cos(float x) {
>    ix &= 0x7fffffff;
> 
>      /* cos(Inf or NaN) is NaN */
> -  if (ix>=0x7f800000) return x-x;
> +  if (ix >= 0x7f800000) return x-x;
> 
> -    /* argument reduction needed */
> +  if(x <= pio4)
> +	  return __kernel_cosf(x, 0.f);
> +  /* argument reduction needed */

Same as above.
Other parts of the patch looks very great to me.

Thanks!
Ruiling



More information about the Beignet mailing list