[Beignet] [PATCH 1/3] Benchmark: Evaluate math performance on intervals

Lupescu, Grigore grigore.lupescu at intel.com
Mon May 2 04:20:57 UTC 2016


The performance is evaluated per interval range not as a whole. As a whole it would depend how big are the ranges.
And if it were random it would depend how often each range would pop up.

Actually I wanted to send the polynomial reduction patch referenced to this one as well - had some trouble sending.

-----Original Message-----
From: Song, Ruiling 
Sent: Monday, May 2, 2016 5:10 AM
To: Lupescu, Grigore <grigore.lupescu at intel.com>; beignet at lists.freedesktop.org
Subject: RE: [Beignet] [PATCH 1/3] Benchmark: Evaluate math performance on intervals



> -----Original Message-----
> From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf 
> Of Grigore Lupescu
> Sent: Monday, May 2, 2016 3:04 AM
> To: beignet at lists.freedesktop.org
> Subject: [Beignet] [PATCH 1/3] Benchmark: Evaluate math performance on 
> intervals
> 
> From: Grigore Lupescu <grigore.lupescu at intel.com>
> 
> Functions to benchmark math functions on intervals.
> Tests: sin, cos, exp2, exp, exp10, log2, log, log10
> 
> Signed-off-by: Grigore Lupescu <grigore.lupescu at intel.com>
> ---
>  benchmark/CMakeLists.txt     |   3 +-
>  benchmark/benchmark_math.cpp | 126 ++++++++++++++++++++
>  kernels/bench_math.cl        | 272
> +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 400 insertions(+), 1 deletion(-)  create mode 100644 
> benchmark/benchmark_math.cpp  create mode 100644 kernels/bench_math.cl
> 
> diff --git a/benchmark/CMakeLists.txt b/benchmark/CMakeLists.txt index 
> dd33829..4c3c933 100644
> --- a/benchmark/CMakeLists.txt
> +++ b/benchmark/CMakeLists.txt
> @@ -18,7 +18,8 @@ set (benchmark_sources
>    benchmark_copy_buffer_to_image.cpp
>    benchmark_copy_image_to_buffer.cpp
>    benchmark_copy_buffer.cpp
> -  benchmark_copy_image.cpp)
> +  benchmark_copy_image.cpp
> +  benchmark_math.cpp)
> 
> +/* calls internal fast (native) if (x > -0x1.6p1 && x < 0x1.6p1) */ 
> +kernel void bench_math_exp(
> +  global float *src,
> +  global float *dst,
> +  float pwr,
> +  uint loop)
> +{
> +  float result = src[get_global_id(0)];
> +
> +  for(; loop > 0; loop--)
> +  {
> +#if defined(BENCHMARK_NATIVE)
> +    result = native_exp(-0x1.6p1 - result); /* calls native */ #elif 
> +defined(BENCHMARK_INTERNAL_FAST)
> +    result = exp(-0x1.6p1 + result); /* calls internal fast */ #else
> +    result = exp(-0x1.6p1 - result); /* calls internal slow */ #endif

I think we should separate the benchmark test from the real implementation.
Then we can make easy comparison with other driver implementation and Also the implementation in Beignet may change in the future.
What's your idea on this?

> +  }
> +
> +  dst[get_global_id(0)] = result;
> +}
> +

> +/* benchmark sin performance */
> +kernel void bench_math_sin(
> +  global float *src,
> +  global float *dst,
> +  float pwr,
> +  uint loop)
> +{
> +  float result = src[get_global_id(0)];
> +
> +  for(; loop > 0; loop--)
> +  {
> +#if defined(BENCHMARK_NATIVE)
> +    result = native_sin(result); /* calls native */ #else
> +    result = sin(result);	/* calls internal, random complexity */

What's the range of 'result'? Seems very small? I think we need to make sure the input argument to sin() in a large range.
As we need try to optimize for general case.

Thanks!
Ruiling
> +    //result = sin(0.1f + result); /* calls internal, (1) no reduction */
> +    //result = sin(2.f + result); /* calls internal, (2) fast reduction */
> +    //result = sin(4001 + result); /* calls internal, (3) slow reduction */
> +    result *= 0x1p-16;
> +#endif
> +  }
> +
> +  dst[get_global_id(0)] = result;
> +}
> +



More information about the Beignet mailing list