[Beignet] [PATCH] Add memory fence before barrier to support global memory barrier.

Zhigang Gong zhigang.gong at linux.intel.com
Tue Jun 18 00:25:06 PDT 2013


On Tue, Jun 18, 2013 at 09:09:51AM +0200, Dag Lem wrote:
> Zhigang Gong <zhigang.gong at linux.intel.com> writes:
> 
> > On Tue, Jun 18, 2013 at 08:38:54AM +0200, Dag Lem wrote:
> 
> [...]
> 
> >> If a thread group encompasses more than one work group, there will be
> >> problems with local memory.
> >> 
> >> On the other hand, if a thread group is equal to one work group, Beignet
> >> will be unable to run more than one work group at once, which will
> >> severely limit the performance of runs with small local sizes.
> > IMO, it will not only dispatch one thread gropu at one time. The GPGPU walker
> > will dispatch as much as possible thread group at one time based on current
> > available EUs. The hardware just need to ensure one work/thread group should not
> > exceed one half-slice's boundary.
> 
> In that case, you'll be bitten by the problem I'm trying to get across:
> OpenCL local memory is local *per work group*.
> 
> If more than one thread group (= work group) is dispatched at one time,
> and all thread groups see the same "local" memory - boom!

No, please check the IVB's manual IHD_OS_Vol2_Part2:

section: 1.5.1.10.2 Shared Local Memory Allocation:

The first thread of a Thread Group is marked as requiring a new shared local memory – if not the old Shared Local Memory offset is sent with the dispatch.

So when disptach each thead gropu's first thread, it will automatically allocate a new SLM buffer for it.
Thus when it disptaches more than one work group to the same half-slice, each work group get different
SLM region.

> 
> [...]
> 
> -- 
> Dag


More information about the Beignet mailing list