[Beignet] [PATCH] Remove global barrier assert.

Zhigang Gong zhigang.gong at linux.intel.com
Fri Jun 14 00:03:03 PDT 2013


Forgot to mention, please set OCL_SIMD_WIDTH to 8.

> -----Original Message-----
> From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> Sent: Friday, June 14, 2013 3:02 PM
> To: 'Zou, Nanhai'; 'Yang, Rong R'
> Cc: 'beignet at lists.freedesktop.org'
> Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> 
> Nanhai,
> 
> Thanks for the information. I agree that in the same kernel thread, we
don't
> need memory fence at all. But for a barrier which is to synchronize
difference
> threads, I still think the memory fence is required at that case.
> 
> I modified Dag's test case to make it a little bit complicate. And it
seems can hit
> the memory fence bug. You can find it will hit the Assert at random
position
> which is very likely a memory fence related bug.
> 
> The utest case:
> 
> #include "utest_helper.hpp"
> 
> static void compiler_global_memory_barrier(void)
> {
>   const size_t n = 16*1024;
> 
>   // Setup kernel and buffers
>   OCL_CREATE_KERNEL("compiler_global_memory_barrier");
>   OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL);
>   OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL);
>   OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]);
>   OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]);
> 
>   // Run the kernel
>   globals[0] = n/2;
>   locals[0] = 16;
>   OCL_NDRANGE(1);
>   OCL_MAP_BUFFER(0);
>   OCL_MAP_BUFFER(1);
> 
>   // Check results
>   uint32_t *dst = (uint32_t*)buf_data[0];
>   volatile uint32_t *src = (uint32_t*)buf_data[1];
>   printf("src 0 %d \n", src[0]);
>   for (uint32_t i = 0; i < n; i+=16)
>     for (uint32_t j = 0; j < 16; ++j)
>       {
>         printf("i %d j %d dst %d \n", i, j, dst[i+j]);
>         OCL_ASSERT(dst[i+j] == 15-j);
>       }
> }
> 
> MAKE_UTEST_FROM_FUNCTION(compiler_global_memory_barrier);
> 
> The kernel:
> 
> __kernel void compiler_global_memory_barrier(__global int *dst, __global
int
> *src) {
>   src[get_local_size(0) * (2 * get_group_id(0)) + get_local_id(0)] =
> get_local_id(0);
>   src[get_local_size(0) * (2 * get_group_id(0) + 1) + get_local_id(0)] =
> get_local_id(0);
>   barrier(CLK_GLOBAL_MEM_FENCE);
>   dst[get_local_size(0) * (2 * get_group_id(0)) + get_local_id(0)] =
> src[get_local_size(0) * 2 * get_group_id(0) + get_local_size(0) -
(get_local_id(0)
> + 1)];
>   dst[get_local_size(0) * (2 * get_group_id(0) + 1) + get_local_id(0)] =
> src[get_local_size(0) * (2 * get_group_id(0) + 1) + get_local_size(0) -
> (get_local_id(0) + 1)]; }
> 
> > -----Original Message-----
> > From: Zou, Nanhai [mailto:nanhai.zou at intel.com]
> > Sent: Friday, June 14, 2013 2:44 PM
> > To: Zhigang Gong; Yang, Rong R
> > Cc: beignet at lists.freedesktop.org
> > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> >
> > Please check
> > Shared Functions > Data Port [Pre-DevSKL] > Read/Write Ordering
> >
> > Reads and writes issued from the same thread are guaranteed to be
> > processed in the same order as issued. Software mechanisms must still
> > ensure any needed ordering of accesses issued from different threads.
> >
> >
> > -----Original Message-----
> > From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> > Sent: Friday, June 14, 2013 2:13 PM
> > To: Yang, Rong R; Zou, Nanhai
> > Cc: beignet at lists.freedesktop.org
> > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> >
> > Rong,
> >
> > Please see my comments:
> >
> > > -----Original Message-----
> > > From: Yang, Rong R [mailto:rong.r.yang at intel.com]
> > > Sent: Friday, June 14, 2013 1:38 PM
> > > To: Zou, Nanhai; Zhigang Gong
> > > Cc: beignet at lists.freedesktop.org
> > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > >
> > > Be sure that mem fence mentioned in bspec is not the same mem fence
> > > mentioned in openCL spec.
> > >
> > > Mem fence in bspec:
> > > A memory fence message issued by a thread causes further messages
> > > issued by the thread to be blocked until all previous messages
> > > issued by the
> > thread to
> > > that data port (data cache or render cache) have been globally
> > > observed
> > from
> > > the point of view of other threads in the system.
> > [Gong, Zhigang] As on IVB, each memory access is done by sending
> > message to data port.
> > So the above statement is exactly the low level mechanism to support
> > the memory fence defined in the OpenCL spec.
> > >
> > > Mem fence in openCL spec:
> > > Orders loads and stores of a work-item executing a kernel.
> > [Gong, Zhigang] Right, the load and store on the same kernel are two
> > send message instruction.
> > Let's assume the first send message is to write to src, and the second
> > message is to read from.
> > With barrier, we can only make sure the second send instruction will
> > be executed after the first one, But by default, the send message is
> > executed asynchronous, and the barrier can't guarantee the second
> > first send message already complete before the second send message
> > start to execute. That's the reason why GEN provide the memory fence.
> >
> > Nanhai and Rong, I really can't find the statement in BPSEC to support
> > your point which means the IVB hardware will guarantee the in order
> > read/write (send message). Could you give me a pointer to It?
> >
> > Thanks.
> >
> > >
> > > Mem fence in openCL is only for a work-item semantic, and is
> > > guaranteed by hardware.
> > >
> > > -----Original Message-----
> > > From: Zou, Nanhai
> > > Sent: Friday, June 14, 2013 1:20 PM
> > > To: Zhigang Gong; Yang, Rong R
> > > Cc: beignet at lists.freedesktop.org
> > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > >
> > > Rong and I have discuss the mem fence topic result the mem fence and
> > global
> > > barrier patch.
> > >
> > > R/W order is ensured in one thread.
> > > So no need to put this in mem fence built-in funtion,
> > >
> > > Mem Fence semantic with in a work group is included in barrier
> > > according
> > to
> > > our test, so no need send an additional message to slow the
performance.
> > >
> > >
> > > Thanks
> > > Zou Nanhai
> > >
> > >
> > > -----Original Message-----
> > > From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> > > Sent: Friday, June 14, 2013 1:13 PM
> > > To: Zou, Nanhai; Yang, Rong R
> > > Cc: beignet at lists.freedesktop.org
> > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > >
> > > I found something in bspec as below. You can check the last sentence.
> > > As
> > Rong
> > > is doing the related work items, I would ask rong to follow the
> > > memory
> > fence
> > > for global memory. Rong, is it ok for you?
> > >
> > > Programming Note:
> > > [DevIVB, DevVLV, DevVLV-T, DevHSW] The memory fence operation is not
> > > required to guarantee SLM memory access ordering between multiple
> > > threads in a thread group for the sequence of a write message, a
> > > barrier message,
> > and
> > > then a read message. (This optimization is due to implementation
> > > details
> > of the
> > > organization of threads in a thread group, SLM memory, data port
> > > messages and gateway barrier messages.) Beware that the memory fence
> > > is still
> > required
> > > for non-SLM memory ordering and observability.
> > >
> > > > -----Original Message-----
> > > > From: Zou, Nanhai [mailto:nanhai.zou at intel.com]
> > > > Sent: Friday, June 14, 2013 1:04 PM
> > > > To: Zhigang Gong; Yang, Rong R
> > > > Cc: beignet at lists.freedesktop.org
> > > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > > >
> > > > Per my understanding,
> > > > barrier has included memory fence semantic, no need to insert
> > > > additional
> > > mem
> > > > fence message.
> > > > I think Rong has done some experiment on that.
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> > > > Sent: Friday, June 14, 2013 1:00 PM
> > > > To: Zou, Nanhai; Yang, Rong R
> > > > Cc: beignet at lists.freedesktop.org
> > > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > > >
> > > > Nanhai,
> > > >
> > > > I'm not worry about the barrier itself. I'm worry about the memory
> > > > fence
> > > part.
> > > > Please refer the following statement from Open CL spec.
> > > >
> > > > The barrier CLK_LOCAL_MEM_FENCE -
> > > > function will either flush any variables stored in local memory or
> > > > queue a memory fence to ensure correct ordering of memory
> > > > operations to local memory.
> > > >
> > > > CLK_GLOBAL_MEM_FENCE - The barrier function will queue a memory
> > > > fence to ensure correct ordering of memory operations to global
> > > > memory. This can be useful when work-items, for
> > > example,
> > > > write to buffer or image objects and then want to read the updated
data.
> > > >
> > > > And current implementation does nothing for the global memory
> > operations.
> > > >
> > > > And IMHO, if we can't make sure the implementation comply with
> > > > OpenCL spec for the specified memory fence function, we can't just
> > > > push the
> > code
> > > now.
> > > > It may cause weird problem in the future which is extremely hard
> > > > to
> > debug.
> > > > Any comments?
> > > >
> > > > > -----Original Message-----
> > > > > From: Zou, Nanhai [mailto:nanhai.zou at intel.com]
> > > > > Sent: Friday, June 14, 2013 12:13 PM
> > > > > To: Zhigang Gong; Yang, Rong R
> > > > > Cc: beignet at lists.freedesktop.org
> > > > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > > > >
> > > > > From the spec, it seems that the logic is correct, the barrier
> > > > > is not
> > > > limited to
> > > > > SHM.
> > > > > Let's push the code till we hit bug.
> > > > >
> > > > > Thanks
> > > > > Zou Nanhai
> > > > >
> > > > > -----Original Message-----
> > > > > From: beignet-bounces+nanhai.zou=intel.com at lists.freedesktop.org
> > > > > [mailto:beignet-bounces+nanhai.zou=intel.com at lists.freedesktop.o
> > > > > rg
> > > > > ]
> > > > > On Behalf Of Zhigang Gong
> > > > > Sent: Friday, June 14, 2013 11:50 AM
> > > > > To: Yang, Rong R
> > > > > Cc: beignet at lists.freedesktop.org
> > > > > Subject: Re: [Beignet] [PATCH] Remove global barrier assert.
> > > > >
> > > > > With this patch, you treat a global memory fence the same as a
> > > > > local
> > > > memory
> > > > > fence.
> > > > > I haven't check the bspec details, but I am doubt it. Need some
> > > > > time to investigate the spec.
> > > > >
> > > > > On Fri, Jun 14, 2013 at 11:01:09AM +0800, Yang Rong wrote:
> > > > > > Per openCL spec, Global memory barrier is consistent across
> > > > > > work-items in a single work group, which is match the bspec's
> > > > > > barrier. So remove global barrier assert.
> > > > > >
> > > > > > Signed-off-by: Yang Rong <rong.r.yang at intel.com>
> > > > > > ---
> > > > > >  backend/src/backend/gen_insn_selection.cpp |    4 ----
> > > > > >  1 file changed, 4 deletions(-)
> > > > > >
> > > > > > diff --git a/backend/src/backend/gen_insn_selection.cpp
> > > > > > b/backend/src/backend/gen_insn_selection.cpp
> > > > > > index 88f9e94..3a139ea 100644
> > > > > > --- a/backend/src/backend/gen_insn_selection.cpp
> > > > > > +++ b/backend/src/backend/gen_insn_selection.cpp
> > > > > > @@ -1607,10 +1607,6 @@ namespace gbe
> > > > > >      INLINE bool emitOne(Selection::Opaque &sel, const
> > > > ir::SyncInstruction
> > > > > &insn) const
> > > > > >      {
> > > > > >        using namespace ir;
> > > > > > -      const uint32_t params = insn.getParameters();
> > > > > > -      GBE_ASSERTM(params == syncLocalBarrier,
> > > > > > -                  "Only barrier(CLK_LOCAL_MEM_FENCE) is
> > > > supported
> > > > > right now "
> > > > > > -                  "for the synchronization primitives");
> > > > > >        const ir::Register reg = sel.reg(FAMILY_DWORD);
> > > > > >
> > > > > >        sel.push();
> > > > > > --
> > > > > > 1.7.10.4
> > > > > >
> > > > > > _______________________________________________
> > > > > > Beignet mailing list
> > > > > > Beignet at lists.freedesktop.org
> > > > > > http://lists.freedesktop.org/mailman/listinfo/beignet
> > > > > _______________________________________________
> > > > > Beignet mailing list
> > > > > Beignet at lists.freedesktop.org
> > > > > http://lists.freedesktop.org/mailman/listinfo/beignet



More information about the Beignet mailing list