[Beignet] [PATCH] Remove global barrier assert.

Zhigang Gong zhigang.gong at linux.intel.com
Fri Jun 14 00:01:33 PDT 2013


Nanhai,

Thanks for the information. I agree that in the same kernel thread, we don't
need memory fence at all. But for a barrier which is
to synchronize difference threads, I still think the memory fence is
required at that case.

I modified Dag's test case to make it a little bit complicate. And it seems
can hit the memory fence bug. You can find it will hit the
Assert at random position which is very likely a memory fence related bug.

The utest case:

#include "utest_helper.hpp"

static void compiler_global_memory_barrier(void)
{
  const size_t n = 16*1024;

  // Setup kernel and buffers
  OCL_CREATE_KERNEL("compiler_global_memory_barrier");
  OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL);
  OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL);
  OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]);
  OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]);

  // Run the kernel
  globals[0] = n/2;
  locals[0] = 16;
  OCL_NDRANGE(1);
  OCL_MAP_BUFFER(0);
  OCL_MAP_BUFFER(1);

  // Check results
  uint32_t *dst = (uint32_t*)buf_data[0];
  volatile uint32_t *src = (uint32_t*)buf_data[1];
  printf("src 0 %d \n", src[0]);
  for (uint32_t i = 0; i < n; i+=16)
    for (uint32_t j = 0; j < 16; ++j)
      {
        printf("i %d j %d dst %d \n", i, j, dst[i+j]);
        OCL_ASSERT(dst[i+j] == 15-j);
      }
}

MAKE_UTEST_FROM_FUNCTION(compiler_global_memory_barrier);

The kernel:

__kernel void compiler_global_memory_barrier(__global int *dst, __global int
*src) {
  src[get_local_size(0) * (2 * get_group_id(0)) + get_local_id(0)] =
get_local_id(0);
  src[get_local_size(0) * (2 * get_group_id(0) + 1) + get_local_id(0)] =
get_local_id(0);
  barrier(CLK_GLOBAL_MEM_FENCE);
  dst[get_local_size(0) * (2 * get_group_id(0)) + get_local_id(0)] =
src[get_local_size(0) * 2 * get_group_id(0) + get_local_size(0) -
(get_local_id(0) + 1)];
  dst[get_local_size(0) * (2 * get_group_id(0) + 1) + get_local_id(0)] =
src[get_local_size(0) * (2 * get_group_id(0) + 1) + get_local_size(0) -
(get_local_id(0) + 1)];
}

> -----Original Message-----
> From: Zou, Nanhai [mailto:nanhai.zou at intel.com]
> Sent: Friday, June 14, 2013 2:44 PM
> To: Zhigang Gong; Yang, Rong R
> Cc: beignet at lists.freedesktop.org
> Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> 
> Please check
> Shared Functions > Data Port [Pre-DevSKL] > Read/Write Ordering
> 
> Reads and writes issued from the same thread are guaranteed to be
processed
> in the same order as issued. Software mechanisms must still ensure any
> needed ordering of accesses issued from different threads.
> 
> 
> -----Original Message-----
> From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> Sent: Friday, June 14, 2013 2:13 PM
> To: Yang, Rong R; Zou, Nanhai
> Cc: beignet at lists.freedesktop.org
> Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> 
> Rong,
> 
> Please see my comments:
> 
> > -----Original Message-----
> > From: Yang, Rong R [mailto:rong.r.yang at intel.com]
> > Sent: Friday, June 14, 2013 1:38 PM
> > To: Zou, Nanhai; Zhigang Gong
> > Cc: beignet at lists.freedesktop.org
> > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> >
> > Be sure that mem fence mentioned in bspec is not the same mem fence
> > mentioned in openCL spec.
> >
> > Mem fence in bspec:
> > A memory fence message issued by a thread causes further messages
> > issued by the thread to be blocked until all previous messages issued
> > by the
> thread to
> > that data port (data cache or render cache) have been globally
> > observed
> from
> > the point of view of other threads in the system.
> [Gong, Zhigang] As on IVB, each memory access is done by sending message
to
> data port.
> So the above statement is exactly the low level mechanism to support the
> memory fence defined in the OpenCL spec.
> >
> > Mem fence in openCL spec:
> > Orders loads and stores of a work-item executing a kernel.
> [Gong, Zhigang] Right, the load and store on the same kernel are two send
> message instruction.
> Let's assume the first send message is to write to src, and the second
message
> is to read from.
> With barrier, we can only make sure the second send instruction will be
> executed after the first one, But by default, the send message is executed
> asynchronous, and the barrier can't guarantee the second first send
message
> already complete before the second send message start to execute. That's
the
> reason why GEN provide the memory fence.
> 
> Nanhai and Rong, I really can't find the statement in BPSEC to support
your
> point which means the IVB hardware will guarantee the in order read/write
> (send message). Could you give me a pointer to It?
> 
> Thanks.
> 
> >
> > Mem fence in openCL is only for a work-item semantic, and is
> > guaranteed by hardware.
> >
> > -----Original Message-----
> > From: Zou, Nanhai
> > Sent: Friday, June 14, 2013 1:20 PM
> > To: Zhigang Gong; Yang, Rong R
> > Cc: beignet at lists.freedesktop.org
> > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> >
> > Rong and I have discuss the mem fence topic result the mem fence and
> global
> > barrier patch.
> >
> > R/W order is ensured in one thread.
> > So no need to put this in mem fence built-in funtion,
> >
> > Mem Fence semantic with in a work group is included in barrier
> > according
> to
> > our test, so no need send an additional message to slow the performance.
> >
> >
> > Thanks
> > Zou Nanhai
> >
> >
> > -----Original Message-----
> > From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> > Sent: Friday, June 14, 2013 1:13 PM
> > To: Zou, Nanhai; Yang, Rong R
> > Cc: beignet at lists.freedesktop.org
> > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> >
> > I found something in bspec as below. You can check the last sentence.
> > As
> Rong
> > is doing the related work items, I would ask rong to follow the memory
> fence
> > for global memory. Rong, is it ok for you?
> >
> > Programming Note:
> > [DevIVB, DevVLV, DevVLV-T, DevHSW] The memory fence operation is not
> > required to guarantee SLM memory access ordering between multiple
> > threads in a thread group for the sequence of a write message, a
> > barrier message,
> and
> > then a read message. (This optimization is due to implementation
> > details
> of the
> > organization of threads in a thread group, SLM memory, data port
> > messages and gateway barrier messages.) Beware that the memory fence
> > is still
> required
> > for non-SLM memory ordering and observability.
> >
> > > -----Original Message-----
> > > From: Zou, Nanhai [mailto:nanhai.zou at intel.com]
> > > Sent: Friday, June 14, 2013 1:04 PM
> > > To: Zhigang Gong; Yang, Rong R
> > > Cc: beignet at lists.freedesktop.org
> > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > >
> > > Per my understanding,
> > > barrier has included memory fence semantic, no need to insert
> > > additional
> > mem
> > > fence message.
> > > I think Rong has done some experiment on that.
> > >
> > >
> > > -----Original Message-----
> > > From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> > > Sent: Friday, June 14, 2013 1:00 PM
> > > To: Zou, Nanhai; Yang, Rong R
> > > Cc: beignet at lists.freedesktop.org
> > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > >
> > > Nanhai,
> > >
> > > I'm not worry about the barrier itself. I'm worry about the memory
> > > fence
> > part.
> > > Please refer the following statement from Open CL spec.
> > >
> > > The barrier CLK_LOCAL_MEM_FENCE -
> > > function will either flush any variables stored in local memory or
> > > queue a memory fence to ensure correct ordering of memory operations
> > > to local memory.
> > >
> > > CLK_GLOBAL_MEM_FENCE - The barrier
> > > function will queue a memory fence to ensure correct ordering of
> > > memory operations to global memory. This can be useful when
> > > work-items, for
> > example,
> > > write to buffer or image objects and then want to read the updated
data.
> > >
> > > And current implementation does nothing for the global memory
> operations.
> > >
> > > And IMHO, if we can't make sure the implementation comply with
> > > OpenCL spec for the specified memory fence function, we can't just
> > > push the
> code
> > now.
> > > It may cause weird problem in the future which is extremely hard to
> debug.
> > > Any comments?
> > >
> > > > -----Original Message-----
> > > > From: Zou, Nanhai [mailto:nanhai.zou at intel.com]
> > > > Sent: Friday, June 14, 2013 12:13 PM
> > > > To: Zhigang Gong; Yang, Rong R
> > > > Cc: beignet at lists.freedesktop.org
> > > > Subject: RE: [Beignet] [PATCH] Remove global barrier assert.
> > > >
> > > > From the spec, it seems that the logic is correct, the barrier is
> > > > not
> > > limited to
> > > > SHM.
> > > > Let's push the code till we hit bug.
> > > >
> > > > Thanks
> > > > Zou Nanhai
> > > >
> > > > -----Original Message-----
> > > > From: beignet-bounces+nanhai.zou=intel.com at lists.freedesktop.org
> > > > [mailto:beignet-bounces+nanhai.zou=intel.com at lists.freedesktop.org
> > > > ]
> > > > On Behalf Of Zhigang Gong
> > > > Sent: Friday, June 14, 2013 11:50 AM
> > > > To: Yang, Rong R
> > > > Cc: beignet at lists.freedesktop.org
> > > > Subject: Re: [Beignet] [PATCH] Remove global barrier assert.
> > > >
> > > > With this patch, you treat a global memory fence the same as a
> > > > local
> > > memory
> > > > fence.
> > > > I haven't check the bspec details, but I am doubt it. Need some
> > > > time to investigate the spec.
> > > >
> > > > On Fri, Jun 14, 2013 at 11:01:09AM +0800, Yang Rong wrote:
> > > > > Per openCL spec, Global memory barrier is consistent across
> > > > > work-items in a single work group, which is match the bspec's
> > > > > barrier. So remove global barrier assert.
> > > > >
> > > > > Signed-off-by: Yang Rong <rong.r.yang at intel.com>
> > > > > ---
> > > > >  backend/src/backend/gen_insn_selection.cpp |    4 ----
> > > > >  1 file changed, 4 deletions(-)
> > > > >
> > > > > diff --git a/backend/src/backend/gen_insn_selection.cpp
> > > > > b/backend/src/backend/gen_insn_selection.cpp
> > > > > index 88f9e94..3a139ea 100644
> > > > > --- a/backend/src/backend/gen_insn_selection.cpp
> > > > > +++ b/backend/src/backend/gen_insn_selection.cpp
> > > > > @@ -1607,10 +1607,6 @@ namespace gbe
> > > > >      INLINE bool emitOne(Selection::Opaque &sel, const
> > > ir::SyncInstruction
> > > > &insn) const
> > > > >      {
> > > > >        using namespace ir;
> > > > > -      const uint32_t params = insn.getParameters();
> > > > > -      GBE_ASSERTM(params == syncLocalBarrier,
> > > > > -                  "Only barrier(CLK_LOCAL_MEM_FENCE) is
> > > supported
> > > > right now "
> > > > > -                  "for the synchronization primitives");
> > > > >        const ir::Register reg = sel.reg(FAMILY_DWORD);
> > > > >
> > > > >        sel.push();
> > > > > --
> > > > > 1.7.10.4
> > > > >
> > > > > _______________________________________________
> > > > > Beignet mailing list
> > > > > Beignet at lists.freedesktop.org
> > > > > http://lists.freedesktop.org/mailman/listinfo/beignet
> > > > _______________________________________________
> > > > Beignet mailing list
> > > > Beignet at lists.freedesktop.org
> > > > http://lists.freedesktop.org/mailman/listinfo/beignet



More information about the Beignet mailing list