[Beignet] [PATCH] GBE: Merge successive load/store together for better performance.

Song, Ruiling ruiling.song at intel.com
Mon Aug 18 19:50:25 PDT 2014


Hi Tony,

In short word, it is not easy to handle merged 2, 3 or 4 bytes read/write in backend.
Currently if you only change the logic in llvm_loadstore_optimization.cpp to make byte read/write merged,
you may get wrong result if the starting address of merged memory access is not-4-byte-aligned.
The later steps will simply treat 4 byte load as 1 int load (int load always need 4-byte-aligned address).
And on Gen7, int load is much better than byte load. So you will see significant 

See emitByteGather() in gen_insn_selection.cpp
if(valueNum > 1) {
	// read 4 byte as 1 int and unpack it, here starting address must be 4-byte-aligned
} else {
  GBE_ASSERT(insn.getValueNum() == 1);
  // read 1 int and extract actual byte using some logic-shift
  // and you can see here it is not too easy to handle 2, 3 or 4 bytes read.
}
I am not sure if I explain it clearly.

Could you share me more details about your test? which OpenCV kernels or related performance test in OpenCV? So I could do some performance testing.
I am not sure if you meet something like vload4(int offset, uchar * p)? OpenCL spec does not ensure the address 'p' is 4-byte-aligned.
If it is a uchar4* read/write, things will be different, the address is 4-byte-aligned. And the performance is much better than vload4 of uchar* in beignet.

Thanks!
Ruiling

-----Original Message-----
From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf Of Moore, Anthony W
Sent: Monday, August 18, 2014 11:47 PM
To: beignet at lists.freedesktop.org
Subject: Re: [Beignet] [PATCH] GBE: Merge successive load/store together for better performance.

Hi,

For this patch http://lists.freedesktop.org/archives/beignet/2014-May/002879.html, why are only DWORDs (and floats) enabled for merging? I tried adding 8-bit and 16-bit and saw some significant performance improvement with some of OpenCV's kernels.

+        // we only support DWORD data type merge
+        if(!ty->isFloatTy() && !ty->isIntegerTy(32)) continue;

Thanks!
Tony 
_______________________________________________
Beignet mailing list
Beignet at lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


More information about the Beignet mailing list