[Beignet] [PATCH] Workgroup reduce add optimization

Lupescu, Grigore grigore.lupescu at intel.com
Wed Dec 23 07:50:30 PST 2015


Ubuntu 14.04 x64, Linux gfxi 3.19.0-33-generic
Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, IntelĀ® HD Graphics 5500

[Variants workgroupOpInThread]
1. Current single ADD(1) x16 times(simd=16) => 408ms => [Result: 1.608 Msum/S]
2. OP DP4(4) + ADD(1) x4 times (simd=16) => 384ms => [Result: 1.707 Msum/S]
3. OP ADD(4) + ADD(1) x4 times (simd=16) => 378ms => [Result: 1.730 Msum/S]

No call to workgroupOpInThread => 347ms => [Result: 1.886 Msum/S].

The improvement of ADD(4) in the function workgroupOpInThread is thus out of 347ms to 408ms at 378ms hence from ~60ms to 30ms.
Using DP4(4) achieves about the same improvement or bellow ADD(4) but has restrictions in data type (can only be float).

I conclude that 3. is the best variant of choice out of the 3 variants.


More information about the Beignet mailing list