[Beignet] [PATCH] Remove global barrier assert.

Zhigang Gong zhigang.gong at linux.intel.com
Fri Jun 14 00:16:52 PDT 2013


Use SIMD16 can also trigger this bug with the following version of unit test
case, no need to modify the cl kernel.

#include "utest_helper.hpp"

static void compiler_global_memory_barrier(void)
{
  const size_t n = 16*1024;

  // Setup kernel and buffers
  OCL_CREATE_KERNEL("compiler_global_memory_barrier");
  OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL);
  OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL);
  OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]);
  OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]);

  // Run the kernel
  globals[0] = n/2;
  locals[0] = 32;
  OCL_NDRANGE(1);
  OCL_MAP_BUFFER(0);
  OCL_MAP_BUFFER(1);

  // Check results
  uint32_t *dst = (uint32_t*)buf_data[0];
  for (uint32_t i = 0; i < n; i+=locals[0])
    for (uint32_t j = 0; j < locals[0]; ++j)
      {
        printf("i %d j %d dst %d \n", i, j, dst[i+j]);
        OCL_ASSERT(dst[i+j] == locals[0] - 1 -j);
      }
}

MAKE_UTEST_FROM_FUNCTION(compiler_global_memory_barrier);

> -----Original Message-----
> From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> Sent: Friday, June 14, 2013 3:03 PM
> To: 'Zou, Nanhai'; 'Yang, Rong R'
> Cc: 'beignet at lists.freedesktop.org'
> Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> 
> Forgot to mention, please set OCL_SIMD_WIDTH to 8.
> 
> > -----Original Message-----
> > From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> > Sent: Friday, June 14, 2013 3:02 PM
> > To: 'Zou, Nanhai'; 'Yang, Rong R'
> > Cc: 'beignet at lists.freedesktop.org'
> > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> >
> > Nanhai,
> >
> > Thanks for the information. I agree that in the same kernel thread, we
> > don't need memory fence at all. But for a barrier which is to
> > synchronize difference threads, I still think the memory fence is
required at
> that case.
> >
> > I modified Dag's test case to make it a little bit complicate. And it
> > seems can hit the memory fence bug. You can find it will hit the
> > Assert at random position which is very likely a memory fence related
bug.
> >
> > The utest case:
> >
> > #include "utest_helper.hpp"
> >
> > static void compiler_global_memory_barrier(void)
> > {
> >   const size_t n = 16*1024;
> >
> >   // Setup kernel and buffers
> >   OCL_CREATE_KERNEL("compiler_global_memory_barrier");
> >   OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL);
> >   OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL);
> >   OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]);
> >   OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]);
> >
> >   // Run the kernel
> >   globals[0] = n/2;
> >   locals[0] = 16;
> >   OCL_NDRANGE(1);
> >   OCL_MAP_BUFFER(0);
> >   OCL_MAP_BUFFER(1);
> >
> >   // Check results
> >   uint32_t *dst = (uint32_t*)buf_data[0];
> >   volatile uint32_t *src = (uint32_t*)buf_data[1];
> >   printf("src 0 %d \n", src[0]);
> >   for (uint32_t i = 0; i < n; i+=16)
> >     for (uint32_t j = 0; j < 16; ++j)
> >       {
> >         printf("i %d j %d dst %d \n", i, j, dst[i+j]);
> >         OCL_ASSERT(dst[i+j] == 15-j);
> >       }
> > }
> >
> > MAKE_UTEST_FROM_FUNCTION(compiler_global_memory_barrier);
> >
> > The kernel:
> >
> > __kernel void compiler_global_memory_barrier(__global int *dst,
> > __global int
> > *src) {
> >   src[get_local_size(0) * (2 * get_group_id(0)) + get_local_id(0)] =
> > get_local_id(0);
> >   src[get_local_size(0) * (2 * get_group_id(0) + 1) + get_local_id(0)]
> > = get_local_id(0);
> >   barrier(CLK_GLOBAL_MEM_FENCE);
> >   dst[get_local_size(0) * (2 * get_group_id(0)) + get_local_id(0)] =
> > src[get_local_size(0) * 2 * get_group_id(0) + get_local_size(0) -
> > (get_local_id(0)
> > + 1)];
> >   dst[get_local_size(0) * (2 * get_group_id(0) + 1) + get_local_id(0)]
> > =
> > src[get_local_size(0) * (2 * get_group_id(0) + 1) + get_local_size(0)
> > -
> > (get_local_id(0) + 1)]; }
> >
> > > -----Original Message-----
> > > From: Zou, Nanhai [mailto:nanhai.zou at intel.com]
> > > Sent: Friday, June 14, 2013 2:44 PM
> > > To: Zhigang Gong; Yang, Rong R
> > > Cc: beignet at lists.freedesktop.org
> > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > >
> > > Please check
> > > Shared Functions > Data Port [Pre-DevSKL] > Read/Write Ordering
> > >
> > > Reads and writes issued from the same thread are guaranteed to be
> > > processed in the same order as issued. Software mechanisms must
> > > still ensure any needed ordering of accesses issued from different
threads.
> > >
> > >
> > > -----Original Message-----
> > > From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> > > Sent: Friday, June 14, 2013 2:13 PM
> > > To: Yang, Rong R; Zou, Nanhai
> > > Cc: beignet at lists.freedesktop.org
> > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > >
> > > Rong,
> > >
> > > Please see my comments:
> > >
> > > > -----Original Message-----
> > > > From: Yang, Rong R [mailto:rong.r.yang at intel.com]
> > > > Sent: Friday, June 14, 2013 1:38 PM
> > > > To: Zou, Nanhai; Zhigang Gong
> > > > Cc: beignet at lists.freedesktop.org
> > > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > > >
> > > > Be sure that mem fence mentioned in bspec is not the same mem
> > > > fence mentioned in openCL spec.
> > > >
> > > > Mem fence in bspec:
> > > > A memory fence message issued by a thread causes further messages
> > > > issued by the thread to be blocked until all previous messages
> > > > issued by the
> > > thread to
> > > > that data port (data cache or render cache) have been globally
> > > > observed
> > > from
> > > > the point of view of other threads in the system.
> > > [Gong, Zhigang] As on IVB, each memory access is done by sending
> > > message to data port.
> > > So the above statement is exactly the low level mechanism to support
> > > the memory fence defined in the OpenCL spec.
> > > >
> > > > Mem fence in openCL spec:
> > > > Orders loads and stores of a work-item executing a kernel.
> > > [Gong, Zhigang] Right, the load and store on the same kernel are two
> > > send message instruction.
> > > Let's assume the first send message is to write to src, and the
> > > second message is to read from.
> > > With barrier, we can only make sure the second send instruction will
> > > be executed after the first one, But by default, the send message is
> > > executed asynchronous, and the barrier can't guarantee the second
> > > first send message already complete before the second send message
> > > start to execute. That's the reason why GEN provide the memory fence.
> > >
> > > Nanhai and Rong, I really can't find the statement in BPSEC to
> > > support your point which means the IVB hardware will guarantee the
> > > in order read/write (send message). Could you give me a pointer to It?
> > >
> > > Thanks.
> > >
> > > >
> > > > Mem fence in openCL is only for a work-item semantic, and is
> > > > guaranteed by hardware.
> > > >
> > > > -----Original Message-----
> > > > From: Zou, Nanhai
> > > > Sent: Friday, June 14, 2013 1:20 PM
> > > > To: Zhigang Gong; Yang, Rong R
> > > > Cc: beignet at lists.freedesktop.org
> > > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > > >
> > > > Rong and I have discuss the mem fence topic result the mem fence
> > > > and
> > > global
> > > > barrier patch.
> > > >
> > > > R/W order is ensured in one thread.
> > > > So no need to put this in mem fence built-in funtion,
> > > >
> > > > Mem Fence semantic with in a work group is included in barrier
> > > > according
> > > to
> > > > our test, so no need send an additional message to slow the
performance.
> > > >
> > > >
> > > > Thanks
> > > > Zou Nanhai
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> > > > Sent: Friday, June 14, 2013 1:13 PM
> > > > To: Zou, Nanhai; Yang, Rong R
> > > > Cc: beignet at lists.freedesktop.org
> > > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > > >
> > > > I found something in bspec as below. You can check the last
sentence.
> > > > As
> > > Rong
> > > > is doing the related work items, I would ask rong to follow the
> > > > memory
> > > fence
> > > > for global memory. Rong, is it ok for you?
> > > >
> > > > Programming Note:
> > > > [DevIVB, DevVLV, DevVLV-T, DevHSW] The memory fence operation is
> > > > not required to guarantee SLM memory access ordering between
> > > > multiple threads in a thread group for the sequence of a write
> > > > message, a barrier message,
> > > and
> > > > then a read message. (This optimization is due to implementation
> > > > details
> > > of the
> > > > organization of threads in a thread group, SLM memory, data port
> > > > messages and gateway barrier messages.) Beware that the memory
> > > > fence is still
> > > required
> > > > for non-SLM memory ordering and observability.
> > > >
> > > > > -----Original Message-----
> > > > > From: Zou, Nanhai [mailto:nanhai.zou at intel.com]
> > > > > Sent: Friday, June 14, 2013 1:04 PM
> > > > > To: Zhigang Gong; Yang, Rong R
> > > > > Cc: beignet at lists.freedesktop.org
> > > > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > > > >
> > > > > Per my understanding,
> > > > > barrier has included memory fence semantic, no need to insert
> > > > > additional
> > > > mem
> > > > > fence message.
> > > > > I think Rong has done some experiment on that.
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> > > > > Sent: Friday, June 14, 2013 1:00 PM
> > > > > To: Zou, Nanhai; Yang, Rong R
> > > > > Cc: beignet at lists.freedesktop.org
> > > > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > > > >
> > > > > Nanhai,
> > > > >
> > > > > I'm not worry about the barrier itself. I'm worry about the
> > > > > memory fence
> > > > part.
> > > > > Please refer the following statement from Open CL spec.
> > > > >
> > > > > The barrier CLK_LOCAL_MEM_FENCE - function will either flush any
> > > > > variables stored in local memory or queue a memory fence to
> > > > > ensure correct ordering of memory operations to local memory.
> > > > >
> > > > > CLK_GLOBAL_MEM_FENCE - The barrier function will queue a memory
> > > > > fence to ensure correct ordering of memory operations to global
> > > > > memory. This can be useful when work-items, for
> > > > example,
> > > > > write to buffer or image objects and then want to read the updated
> data.
> > > > >
> > > > > And current implementation does nothing for the global memory
> > > operations.
> > > > >
> > > > > And IMHO, if we can't make sure the implementation comply with
> > > > > OpenCL spec for the specified memory fence function, we can't
> > > > > just push the
> > > code
> > > > now.
> > > > > It may cause weird problem in the future which is extremely hard
> > > > > to
> > > debug.
> > > > > Any comments?
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Zou, Nanhai [mailto:nanhai.zou at intel.com]
> > > > > > Sent: Friday, June 14, 2013 12:13 PM
> > > > > > To: Zhigang Gong; Yang, Rong R
> > > > > > Cc: beignet at lists.freedesktop.org
> > > > > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > > > > >
> > > > > > From the spec, it seems that the logic is correct, the barrier
> > > > > > is not
> > > > > limited to
> > > > > > SHM.
> > > > > > Let's push the code till we hit bug.
> > > > > >
> > > > > > Thanks
> > > > > > Zou Nanhai
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From:
> > > > > > beignet-bounces+nanhai.zou=intel.com at lists.freedesktop.org
> > > > > > [mailto:beignet-bounces+nanhai.zou=intel.com at lists.freedesktop
> > > > > > .o
> > > > > > rg
> > > > > > ]
> > > > > > On Behalf Of Zhigang Gong
> > > > > > Sent: Friday, June 14, 2013 11:50 AM
> > > > > > To: Yang, Rong R
> > > > > > Cc: beignet at lists.freedesktop.org
> > > > > > Subject: Re: [Beignet] [PATCH] Remove global barrier assert.
> > > > > >
> > > > > > With this patch, you treat a global memory fence the same as a
> > > > > > local
> > > > > memory
> > > > > > fence.
> > > > > > I haven't check the bspec details, but I am doubt it. Need
> > > > > > some time to investigate the spec.
> > > > > >
> > > > > > On Fri, Jun 14, 2013 at 11:01:09AM +0800, Yang Rong wrote:
> > > > > > > Per openCL spec, Global memory barrier is consistent across
> > > > > > > work-items in a single work group, which is match the
> > > > > > > bspec's barrier. So remove global barrier assert.
> > > > > > >
> > > > > > > Signed-off-by: Yang Rong <rong.r.yang at intel.com>
> > > > > > > ---
> > > > > > >  backend/src/backend/gen_insn_selection.cpp |    4 ----
> > > > > > >  1 file changed, 4 deletions(-)
> > > > > > >
> > > > > > > diff --git a/backend/src/backend/gen_insn_selection.cpp
> > > > > > > b/backend/src/backend/gen_insn_selection.cpp
> > > > > > > index 88f9e94..3a139ea 100644
> > > > > > > --- a/backend/src/backend/gen_insn_selection.cpp
> > > > > > > +++ b/backend/src/backend/gen_insn_selection.cpp
> > > > > > > @@ -1607,10 +1607,6 @@ namespace gbe
> > > > > > >      INLINE bool emitOne(Selection::Opaque &sel, const
> > > > > ir::SyncInstruction
> > > > > > &insn) const
> > > > > > >      {
> > > > > > >        using namespace ir;
> > > > > > > -      const uint32_t params = insn.getParameters();
> > > > > > > -      GBE_ASSERTM(params == syncLocalBarrier,
> > > > > > > -                  "Only barrier(CLK_LOCAL_MEM_FENCE) is
> > > > > supported
> > > > > > right now "
> > > > > > > -                  "for the synchronization primitives");
> > > > > > >        const ir::Register reg = sel.reg(FAMILY_DWORD);
> > > > > > >
> > > > > > >        sel.push();
> > > > > > > --
> > > > > > > 1.7.10.4
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Beignet mailing list
> > > > > > > Beignet at lists.freedesktop.org
> > > > > > > http://lists.freedesktop.org/mailman/listinfo/beignet
> > > > > > _______________________________________________
> > > > > > Beignet mailing list
> > > > > > Beignet at lists.freedesktop.org
> > > > > > http://lists.freedesktop.org/mailman/listinfo/beignet



More information about the Beignet mailing list