[Intel-gfx] performance movnti is better than clflush ?
Ma, Ling
ling.ma at intel.com
Fri Jul 24 13:20:46 CEST 2009
Hi All
I find movnti + mfence is better than clflush as below report shows (on core2 platform)
Size(byte) movnti(us) clflush (us) speedup
4k 3.01 3.56 1.182
16k 12.01 14.23 1.184
32k 23.93 28.45 1.188
64k 47.92 56.89 1.187
The code for two cases (only care about alignment):
Movnti + mfence clflush
For (i = 0; i < size; i = i+ 64) { For (i = 0; i < size; i = i + 64)
__asm__("movq (addr + i), %rax); clflush(addr + i);
__asm__("movntiq %rax, (addr + i);
}
_-asm__ ("mfence")
Movnti will invalidate cache line before writing data into write combine buffer, at last we may use mfence to
drain out the left data in write combine buffer, and behavior looks like clflush.
The approach is only fit for small page, when size is bigger than about 128k(on my platform),
movnti + mfence approach get worse because read instruction.
If theory is right, we can get benefit from many flush operation in gem.
Thanks
Ma Ling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20090724/217c9bed/attachment.html>
More information about the Intel-gfx
mailing list