[Beignet] [PATCH] Add memory fence before barrier to support global memory barrier.

Zhigang Gong zhigang.gong at linux.intel.com
Tue Jun 18 00:11:22 PDT 2013


On Tue, Jun 18, 2013 at 06:58:24AM +0000, Zou, Nanhai wrote:
> >>IMO, it will not only dispatch one thread gropu at one time. The GPGPU walker will dispatch as much as possible thread group at one time based on current available EUs. The hardware just need to ensure one work/thread group should not exceed one half-slice's boundary.
> 
> I think we should limit the thread group size according to SLM size per half slice. HW will not automatically split them.

Each slice has 64KB shared local memory. And we already check the size limitation at runtime library. One thing we may warry
about is that the thread group size is larger than the half slice's threads' count. For IVB GT2, one half-slice has 8 EUs, and
each EU has 8 threads. So a half-slice can have up to 64 threads. In Yang Rong's case, we only has 4 threads for one thread group.

My thought is as below:
If the hardware doesn't split the thread group according to half-slice's boundary. Then I think even we limit one thread group
to less than the limitation, it may still fail. As it may dispatch more than one thread group at one time, and then some
thread groups may corss the slice boundary. Right? Am I misunderstanding anything?


> 
> Thanks
> Zou Nanhai.


More information about the Beignet mailing list