[Beignet] Combine Loads from __constant space

Song, Ruiling ruiling.song at intel.com
Tue Nov 25 18:14:56 PST 2014


Great Job Tony!
From Spec, only below messages are supported on constant_cache (which is Hardware support for __constant memory read).
Message Type
0000: OWord Block Read
0001: Unaligned OWord Block Read
0010: OWord Dual Block Read
0011: DWord Scattered Read
All other encodings are reserved.

For a normal varying load(different work-item access different buffer address), we would use dword scatter read(that is dword_gather in gen_insn_selection.cpp),
But it is sad these message do not support 2/3/4 DWORD read. It only supports one simd8/simd16 DWORD read. So, we have to split constant memory load instructions.
But there did exist some opportunity that uniform load of constant memory can be merged. I think it is in our TODO list.


From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf Of Tony Moore
Sent: Wednesday, November 26, 2014 5:11 AM
To: beignet at lists.freedesktop.org
Subject: [Beignet] Combine Loads from __constant space

Hello,
I notice that reads are not being combined when I use __constant on a read-only kernel buffer. Is this something that can be improved?

In my kernel there are many loads from a read-only data structure. When I use the __global specifier for the memory space I see a total of 33 send instructions and a runtime of 81ms. When I use the __constant specifier, I see 43 send instructions and a runtime of 40ms. I'm hoping that combining the loads could improve performance further.

thanks!
tony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/beignet/attachments/20141126/63349766/attachment.html>


More information about the Beignet mailing list