[Beignet] [PATCH 1/3] Benchmark: Evaluate math performance on intervals
Song, Ruiling
ruiling.song at intel.com
Thu May 5 01:38:28 UTC 2016
> -----Original Message-----
> From: Lupescu, Grigore
> Sent: Wednesday, May 4, 2016 3:06 PM
> To: Song, Ruiling <ruiling.song at intel.com>; beignet at lists.freedesktop.org
> Subject: RE: [Beignet] [PATCH 1/3] Benchmark: Evaluate math performance on
> intervals
>
> > I think this may lead to optimize for a special input-range.
>
> I agree - my ideea with the benchmark was just looking at how fast is a the
> function on an interval.
> I've looked at a function - say sinus - and saw that there are 3 paths the code
> may take based on the input value. To properly evaluate the performance of
> each path I would do x = sin (x + a) then x = x * 0x1p-16, where a is the min value
> of an interval, and x is close to 0. So at the end I would know that the
> performance is T on (a, b) interval, 2T on (b, c) interval etc.
> Now this won't tell me how sinus actually performs since I don't know how
> often sinus with values (a, b) is called vs (b, c) or other - but it would tell me for
> instance that internal_1 performance (a, b) is 6 times faster than internal_2 (b, c)
> and 9 times than internal_3 (c, d) etc..
I get your point. I previously thought you would re-design the math function implementations. And implement them one by one.
From your message, seems that you are in a way to benchmark them in different ranges and optimize them.
Assume all math functions will be implemented in a range-based algorithm. This is true.
The biggest difference between cpu and gpu is that on gpu we need to try best to reduce divergent code.
Yes, the sine implementation has an if-else. That is because the payne-hanek is too much slow, that's why I use a fast version for small input value.
I would suppose there would be "no or very little" input range-check for other math functions.
If there must be some range-check, I would like to keep the if/else check in a math function at most two.
You can continue with your range-based benchmarking and optimization, see how much improvement we can get.
But I would also suggest you to check whether we can re-implement these functions in a new way like table-lookup or other techniques if you can find in papers.
>
> I believe the only way to evaluate if a change in math code is relevant is with
> real world tests. We thus must have a diverse set of tests that use most math
> functions. Ideally one should document what each test uses and in what
> proportion. I have starting doing this but it's taking a lot of time due to the
> complexity of some tests (e.g. Luxmark).
Yes, if we can gather that information, it would be useful.
> -------------------------------------------------------------
>
> So I see the following flow of optimization for Beignet - but may apply to any
> other math implemention for OpenCL:
>
> 1. (done) See performance of each interval for a given function (sin). We would
> know perf1 on (a, b), perf2 on (b, c), perf3 on (c, d)
> 2. (working) Run several relevant math tests (relevant to sinus). Try to identify in
> what circumstances is sin called. Maybe all tests call it on (a,b) and (b,c). Then
> we should target (a,b) and (b,c) because that is what is being used. This would
> assume math tests are well chosen and diverse.
> 3. (working) Optimize intervals (a, b) and (b, c). Observe how each optimized
> since we can test performance on intervals. Re-run real world math tests.
> Any thoughts on this ?
I just have some worry that like you try to optimize sine, and you find that benchmark1 can be optimized through adding a input range check,
Then you find benchmark2, you add another input range check to optimize it, this may easily lead to much more divergent code.
Which may eventually evolve to a good CPU version but not good GPU version.
Optimize for benchmark is OK for me. But I would encourage you to reduce divergent code instead of introducing more divergent code. That is my point.
>
> I did some optimizations (call to native and polynomial reduction) and obtained
> an increase of at least 5% in about 8 - 10 math tests from the ones provided by
> Mengmeng. It's quite difficult to target the general case for all math functions
> but I think these changes are relevant to some point.
More information about the Beignet
mailing list