[Beignet] [PATCH] Add memory fence before barrier to support global memory barrier.

Dag Lem dag at nimrod.no
Tue Jun 18 00:09:51 PDT 2013


Zhigang Gong <zhigang.gong at linux.intel.com> writes:

> On Tue, Jun 18, 2013 at 08:38:54AM +0200, Dag Lem wrote:

[...]

>> If a thread group encompasses more than one work group, there will be
>> problems with local memory.
>> 
>> On the other hand, if a thread group is equal to one work group, Beignet
>> will be unable to run more than one work group at once, which will
>> severely limit the performance of runs with small local sizes.
> IMO, it will not only dispatch one thread gropu at one time. The GPGPU walker
> will dispatch as much as possible thread group at one time based on current
> available EUs. The hardware just need to ensure one work/thread group should not
> exceed one half-slice's boundary.

In that case, you'll be bitten by the problem I'm trying to get across:
OpenCL local memory is local *per work group*.

If more than one thread group (= work group) is dispatched at one time,
and all thread groups see the same "local" memory - boom!

[...]

-- 
Dag


More information about the Beignet mailing list