[Beignet] [PATCH] Refine the cl thread implement for queue.

Sun May 25 18:37:08 PDT 2014

And I just checked the clFinish and clFlush, they only need to access the
batch buffer.
So the root cause is that we always allocate a new batch buffer for a new
kernel submitting
for a queue. Even if there are many kernel enqueuing on the same queue.

If we can maintain a uniform batch buffer for the single queue, then this
issue will be
solved clearly and gracefully.

IMO, this is not the OpenCL spec issue. This is a implementation issue which
we should solved
In the future. What's your opinion?

BTW, I'm ok with current implementation. But I found you may missed some
minor comments
Which embedded in my first email. Could you recheck it and solve those
comment and
Send a new version of the patch?

> -----Original Message-----
> From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> Sent: Monday, May 26, 2014 9:31 AM
> To: 'He Junyan'
> Cc: 'Junyan He'; 'beignet at lists.freedesktop.org'
> Subject: RE: [Beignet] [PATCH] Refine the cl thread implement for queue.
> 
> Ok. The key issue is that the private gpgpu data structure is still needed
after
> each kernel execution.
> And the gpgpu data is different for each kernel execution, right?
> Could you list all of the scenarios where we need to use the gpgpu data
after
> the kernel submitting?
> I can think of the following two:
> 1. clFinish
> 2. clFlush
> 
> Is there any other cases?
> 
> > -----Original Message-----
> > From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf
> > Of He Junyan
> > Sent: Friday, May 23, 2014 11:36 PM
> > To: Zhigang Gong
> > Cc: Junyan He; beignet at lists.freedesktop.org
> > Subject: Re: [Beignet] [PATCH] Refine the cl thread implement for queue.
> >
> > I think it is hard to avoid using thread local data.
> > Because when the queue creating, we do not know how many threads will
> > use this queue later.
> > The GPGPU resources will be hold in the queue, but at least every
> > thread should have a local data to store the index to find the GPGPU
data in
> the queue.
> > And the thread should need to destroy the GPGPU resource when the
> > thread exit, while the queue's life time may be much longer than the
thread.
> >
> > OpenCL spec fail to define the relationship between the queue and the
> threads.
> > This cause the dilemma.
> >
> > Please give some good advices if any.
> >
> > On Fri, 2014-05-23 at 16:33 +0800, Zhigang Gong wrote:
> > > Some minor comments as below.
> > >
> > > One thought about the usage of thread local data we are using here.
> > > The original reason why we want to use thread local data is to avoid
> > > lock as much as possible. But finally, we found to satisfy all the
> > > use scenario, we can't avoid lock any way. Now we introduce lock
> > > eventually. Then is there still good reason why we should use these
> > > thread local data any more?
> > >
> > > One possible question is as below:
> > >
> > > If one queue is used in another thread to enqueue task, does it make
> > > sense to create a thread local new gpgpu data and in this thread. Or
> > > we can just simply lock and wait for other thread to unlock the queue?
> > >
> > > On Tue, May 20, 2014 at 02:26:47PM +0800, junyan.he at inbox.com wrote:
> > > > From: Junyan He <junyan.he at linux.intel.com>
> > > >
> > > > Because the cl_command_queue can be used in several threads
> > > > simultaneously but without add ref to it, we now handle it like
this:
> > > > Keep one threads_slot_array, every time the thread get gpgpu or
> > > > batch buffer, if it does not have a slot, assign it.
> > > > The resources are keeped in queue private, and resize it if needed.
> > > > When the thread exit, the slot will be set invalid.
> > > > When queue released, all the resources will be released. If user
> > > > still enqueue, flush or finish the queue after it has been
> > > > released, the
> > behavior is undefined.
> > > > TODO: Need to shrink the slot map.
> > > >
> > > > Signed-off-by: Junyan He <junyan.he at linux.intel.com>
> > > > ---
> > > >  src/cl_command_queue.c      |   6 +-
> > > >  src/cl_command_queue_gen7.c |   2 +-
> > > >  src/cl_context.c            |   2 +-
> > > >  src/cl_device_id.c          |   2 +-
> > > >  src/cl_thread.c             | 261
> > +++++++++++++++++++++++++++++++++-----------
> > > >  src/cl_thread.h             |   6 +-
> > > >  6 files changed, 205 insertions(+), 74 deletions(-)
> > > >
> > > > diff --git a/src/cl_command_queue.c b/src/cl_command_queue.c index
> > > > 6a699c0..802d313 100644
> > > > --- a/src/cl_command_queue.c
> > > > +++ b/src/cl_command_queue.c
> > > > @@ -60,6 +60,7 @@ cl_command_queue_new(cl_context ctx)
> > > >    /* The queue also belongs to its context */
> > > >    cl_context_add_ref(ctx);
> > > >
> > > > +
> > > useless new line.
> > > >  exit:
> > > >    return queue;
> > > >  error:
> > > > @@ -74,6 +75,7 @@ cl_command_queue_delete(cl_command_queue
> > queue)
> > > >    assert(queue);
> > > >    if (atomic_dec(&queue->ref_n) != 1) return;
> > > >
> > > > +  if (queue->ref_n < 0)
> > > weird if statement here? What's the purpose?
> > > >    /* Remove it from the list */
> > > >    assert(queue->ctx);
> > > >    pthread_mutex_lock(&queue->ctx->queue_lock);
> > > > @@ -89,7 +91,7 @@ cl_command_queue_delete(cl_command_queue
> > queue)
> > > >      queue->fulsim_out = NULL;
> > > >    }
> > > >
> > > > -  cl_thread_data_destroy(queue->thread_data);
> > > > +  cl_thread_data_destroy(queue);
> > > >    queue->thread_data = NULL;
> > > >    cl_mem_delete(queue->perf);
> > > >    cl_context_delete(queue->ctx);
> > > > @@ -430,7 +432,7 @@ cl_command_queue_flush(cl_command_queue
> > queue)
> > > > LOCAL cl_int  cl_command_queue_finish(cl_command_queue queue)  {
> > > > -  cl_gpgpu_sync(cl_get_thread_batch_buf());
> > > > +  cl_gpgpu_sync(cl_get_thread_batch_buf(queue));
> > > >    return CL_SUCCESS;
> > > >  }
> > > >
> > > > diff --git a/src/cl_command_queue_gen7.c
> > > > b/src/cl_command_queue_gen7.c index 891d6f1..d875021 100644
> > > > --- a/src/cl_command_queue_gen7.c
> > > > +++ b/src/cl_command_queue_gen7.c
> > > > @@ -327,7 +327,7 @@
> > cl_command_queue_ND_range_gen7(cl_command_queue queue,
> > > >    /* Start a new batch buffer */
> > > >    batch_sz = cl_kernel_compute_batch_sz(ker);
> > > >    cl_gpgpu_batch_reset(gpgpu, batch_sz);
> > > > -  cl_set_thread_batch_buf(cl_gpgpu_ref_batch_buf(gpgpu));
> > > > +  cl_set_thread_batch_buf(queue, cl_gpgpu_ref_batch_buf(gpgpu));
> > > >    cl_gpgpu_batch_start(gpgpu);
> > > >
> > > >    /* Issue the GPGPU_WALKER command */ diff --git
> > > > a/src/cl_context.c b/src/cl_context.c index 8190e6a..452afb8
> > > > 100644
> > > > --- a/src/cl_context.c
> > > > +++ b/src/cl_context.c
> > > > @@ -203,7 +203,7 @@ cl_context_delete(cl_context ctx)
> > > >    assert(ctx->buffers == NULL);
> > > >    assert(ctx->drv);
> > > >    cl_free(ctx->prop_user);
> > > > -  cl_set_thread_batch_buf(NULL);
> > > > +//  cl_set_thread_batch_buf(NULL);
> > > why not just remove the useless code?
> > >
> > > >    cl_driver_delete(ctx->drv);
> > > >    ctx->magic = CL_MAGIC_DEAD_HEADER; /* For safety */
> > > >    cl_free(ctx);
> > > > diff --git a/src/cl_device_id.c b/src/cl_device_id.c index
> > > > 268789a..15a776d 100644
> > > > --- a/src/cl_device_id.c
> > > > +++ b/src/cl_device_id.c
> > > > @@ -106,7 +106,7 @@ LOCAL cl_device_id
> > > >  cl_get_gt_device(void)
> > > >  {
> > > >    cl_device_id ret = NULL;
> > > > -  cl_set_thread_batch_buf(NULL);
> > > > + // cl_set_thread_batch_buf(NULL);
> > > ditto?
> > >
> > > >    const int device_id = cl_driver_get_device_id();
> > > >    cl_device_id device = NULL;
> > > >
> > > > diff --git a/src/cl_thread.c b/src/cl_thread.c index
> > > > cadc3cd..1b517ac 100644
> > > > --- a/src/cl_thread.c
> > > > +++ b/src/cl_thread.c
> > > > @@ -15,113 +15,242 @@
> > > >   * License along with this library. If not, see
> > <http://www.gnu.org/licenses/>.
> > > >   *
> > > >   */
> > > > +#include <string.h>
> > > > +#include <stdio.h>
> > > >
> > > >  #include "cl_thread.h"
> > > >  #include "cl_alloc.h"
> > > >  #include "cl_utils.h"
> > > >
> > > > -static __thread void* thread_batch_buf = NULL;
> > > > +/* Because the cl_command_queue can be used in several threads
> > simultaneously but
> > > > +   without add ref to it, we now handle it like this:
> > > > +   Keep one threads_slot_array, every time the thread get gpgpu
> > > > + or
> > batch buffer, if it
> > > > +   does not have a slot, assign it.
> > > > +   The resources are keeped in queue private, and resize it if
needed.
> > > > +   When the thread exit, the slot will be set invalid.
> > > > +   When queue released, all the resources will be released. If
> > > > + user still
> > enqueue, flush
> > > > +   or finish the queue after it has been released, the behavior
> > > > + is
> > undefined.
> > > > +   TODO: Need to shrink the slot map.
> > > > +   */
> > > >
> > > > -typedef struct _cl_thread_spec_data {
> > > > +static int thread_array_num = 1;
> > > > +static int *thread_slot_map = NULL; static int thread_magic_num =
> > > > +1; static pthread_mutex_t thread_queue_map_lock =
> > > > +PTHREAD_MUTEX_INITIALIZER; static pthread_key_t destroy_key;
> > > > +
> > > > +static __thread int thread_id = -1; static __thread int
> > > > +thread_magic = -1;
> > > > +
> > > > +typedef struct _thread_spec_data {
> > > >    cl_gpgpu gpgpu ;
> > > >    int valid;
> > > > -}cl_thread_spec_data;
> > > > +  void* thread_batch_buf;
> > > > +  int thread_magic;
> > > > +} thread_spec_data;
> > > >
> > > > -void cl_set_thread_batch_buf(void* buf) {
> > > > -  if (thread_batch_buf) {
> > > > -    cl_gpgpu_unref_batch_buf(thread_batch_buf);
> > > > -  }
> > > > -  thread_batch_buf = buf;
> > > > -}
> > > > +typedef struct _queue_thread_private {
> > > > +  thread_spec_data**  threads_data;
> > > > +  int threads_data_num;
> > > > +  pthread_mutex_t thread_data_lock; } queue_thread_private;
> > > >
> > > > -void* cl_get_thread_batch_buf(void) {
> > > > -  return thread_batch_buf;
> > > > +static void thread_data_destructor(void *dummy) {
> > > > +  pthread_mutex_lock(&thread_queue_map_lock);
> > > > +  thread_slot_map[thread_id] = 0;
> > > > +  pthread_mutex_unlock(&thread_queue_map_lock);
> > > > +  free(dummy);
> > > >  }
> > > >
> > > > -cl_gpgpu cl_get_thread_gpgpu(cl_command_queue queue)
> > > > +static thread_spec_data *
> > > > +__create_thread_spec_data(cl_command_queue queue, int create)
> > > >  {
> > > > -  pthread_key_t* key = queue->thread_data;
> > > > -  cl_thread_spec_data* thread_spec_data =
> > > > pthread_getspecific(*key);
> > > > -
> > > > -  if (!thread_spec_data) {
> > > > -    TRY_ALLOC_NO_ERR(thread_spec_data, CALLOC(struct
> > _cl_thread_spec_data));
> > > > -    if (pthread_setspecific(*key, thread_spec_data)) {
> > > > -      cl_free(thread_spec_data);
> > > > -      return NULL;
> > > > +  queue_thread_private *thread_private = ((queue_thread_private
> > > > + *)(queue->thread_data));
> > > > +  thread_spec_data* spec = NULL;
> > > > +  int i = 0;
> > > > +  int *old;
> > > > +
> > > > +  if (thread_id == -1) {
> > > > +    void * dummy = malloc(sizeof(int));
> > > > +
> > > > +    pthread_mutex_lock(&thread_queue_map_lock);
> > > > +    for (i = 0; i < thread_array_num; i++) {
> > > > +      if (thread_slot_map[i] == 0) {
> > > > +        thread_id = i;
> > > > +        break;
> > > > +      }
> > > > +    }
> > > > +
> > > > +    if (i == thread_array_num) {
> > > > +      thread_array_num *= 2;
> > > > +      old = thread_slot_map;
> > > > +      thread_slot_map = malloc(sizeof(int) * thread_array_num);
> > > > +      memset(thread_slot_map, 0, (sizeof(int) * thread_array_num));
> > > > +      for (i = 0; i < thread_array_num/2; i++) {
> > > > +        thread_slot_map[i] = old[i];
> > > > +      }
> > > > +
> > > > +      thread_id = thread_array_num/2;
> > > > +      cl_free(old);
> > > why not just use a realloc() here?
> > > > +    }
> > > > +
> > > > +    thread_slot_map[thread_id] = 1;
> > > > +
> > > > +    thread_magic = thread_magic_num++;
> > > > +    pthread_mutex_unlock(&thread_queue_map_lock);
> > > > +
> > > > +    pthread_setspecific(destroy_key, dummy);  }
> > > > +
> > > > +  pthread_mutex_lock(&thread_private->thread_data_lock);
> > > > +  if (thread_array_num > thread_private->threads_data_num) {//
> > > > + just
> > enlarge
> > > > +    thread_spec_data ** old = thread_private->threads_data;
> > > > +    int old_num = thread_private->threads_data_num;
> > > > +    thread_private->threads_data_num = thread_array_num;
> > > > +    thread_private->threads_data =
> > malloc(thread_private->threads_data_num * sizeof(void *));
> > > > +    memset(thread_private->threads_data, 0, sizeof(void*) *
> > thread_private->threads_data_num);
> > > > +    for (i = 0; i < old_num; i++) {
> > > > +      thread_private->threads_data[i] = old[i];
> > > >      }
> > > > +    free(old);
> > >       Another place which may better to use realloc;
> > > >    }
> > > >
> > > > -  if (!thread_spec_data->valid) {
> > > > -    TRY_ALLOC_NO_ERR(thread_spec_data->gpgpu,
> > cl_gpgpu_new(queue->ctx->drv));
> > > > -    thread_spec_data->valid = 1;
> > > > +  assert(thread_id != -1 && thread_id < thread_array_num);
> > > > +  spec = thread_private->threads_data[thread_id];
> > > > +  if (!spec && create) {
> > > > +	spec = CALLOC(thread_spec_data);
> > > > +	spec->thread_magic = thread_magic;
> > > > +	thread_private->threads_data[thread_id] = spec;
> > > >    }
> > > >
> > > > -error:
> > > > -  return thread_spec_data->gpgpu;
> > > > +  pthread_mutex_unlock(&thread_private->thread_data_lock);
> > > > +
> > > > +  return spec;
> > > >  }
> > > >
> > > > -void cl_invalid_thread_gpgpu(cl_command_queue queue)
> > > > +void* cl_thread_data_create(void)
> > > >  {
> > > > -  pthread_key_t* key = queue->thread_data;
> > > > -  cl_thread_spec_data* thread_spec_data =
> > > > pthread_getspecific(*key);
> > > > +  queue_thread_private* thread_private =
> > > > + CALLOC(queue_thread_private);
> > > >
> > > > -  if (!thread_spec_data) {
> > > > -    return;
> > > > -  }
> > > > +  if (thread_private == NULL)
> > > > +    return NULL;
> > > >
> > > > -  if (!thread_spec_data->valid) {
> > > > -    return;
> > > > +  if (thread_slot_map == NULL) {
> > > > +    pthread_mutex_lock(&thread_queue_map_lock);
> > > > +    thread_slot_map = malloc(sizeof(int) * thread_array_num);
> > > > +    memset(thread_slot_map, 0, (sizeof(int) * thread_array_num));
> > >        calloc = malloc + memset to zero.
> > > > +    pthread_mutex_unlock(&thread_queue_map_lock);
> > > > +
> > > > +    pthread_key_create(&destroy_key, thread_data_destructor);
> > > >    }
> > > >
> > > > -  assert(thread_spec_data->gpgpu);
> > > > -  cl_gpgpu_delete(thread_spec_data->gpgpu);
> > > > -  thread_spec_data->valid = 0;
> > > > +  pthread_mutex_init(&thread_private->thread_data_lock, NULL);
> > > > +
> > > > +  pthread_mutex_lock(&thread_private->thread_data_lock);
> > > > +  thread_private->threads_data = malloc(thread_array_num *
> > > > + sizeof(void *));  memset(thread_private->threads_data, 0,
> > > > + sizeof(void*) * thread_array_num);
> > > > + thread_private->threads_data_num = thread_array_num;
> > > > + pthread_mutex_unlock(&thread_private->thread_data_lock);
> > > > +
> > > > +  return thread_private;
> > > >  }
> > > >
> > > > -static void thread_data_destructor(void *data) {
> > > > -  cl_thread_spec_data* thread_spec_data = (cl_thread_spec_data
> > > > *)data;
> > > > +cl_gpgpu cl_get_thread_gpgpu(cl_command_queue queue) {
> > > > +  thread_spec_data* spec = __create_thread_spec_data(queue, 1);
> > > >
> > > > -  if (thread_batch_buf) {
> > > > -    cl_gpgpu_unref_batch_buf(thread_batch_buf);
> > > > -    thread_batch_buf = NULL;
> > > > +  if (!spec->thread_magic && spec->thread_magic != thread_magic) {
> > > > +    //We may get the slot from last thread. So free the resource.
> > > > +    spec->valid = 0;
> > > >    }
> > > >
> > > > -  if (thread_spec_data->valid)
> > > > -    cl_gpgpu_delete(thread_spec_data->gpgpu);
> > > > -  cl_free(thread_spec_data);
> > > > +  if (!spec->valid) {
> > > > +    if (spec->thread_batch_buf) {
> > > > +      cl_gpgpu_unref_batch_buf(spec->thread_batch_buf);
> > > > +      spec->thread_batch_buf = NULL;
> > > > +    }
> > > > +    if (spec->gpgpu) {
> > > > +      cl_gpgpu_delete(spec->gpgpu);
> > > > +      spec->gpgpu = NULL;
> > > > +    }
> > > > +    TRY_ALLOC_NO_ERR(spec->gpgpu,
> cl_gpgpu_new(queue->ctx->drv));
> > > > +    spec->valid = 1;
> > > > +  }
> > > > +
> > > > + error:
> > > > +  return spec->gpgpu;
> > > >  }
> > > >
> > > > -/* Create the thread specific data. */
> > > > -void* cl_thread_data_create(void)
> > > > +void cl_set_thread_batch_buf(cl_command_queue queue, void* buf)
> > > >  {
> > > > -  int rc = 0;
> > > > +  thread_spec_data* spec = __create_thread_spec_data(queue, 1);
> > > >
> > > > -  pthread_key_t *thread_specific_key = CALLOC(pthread_key_t);
> > > > -  if (thread_specific_key == NULL)
> > > > -    return NULL;
> > > > +  assert(spec && spec->thread_magic == thread_magic);
> > > > +
> > > > +  if (spec->thread_batch_buf) {
> > > > +    cl_gpgpu_unref_batch_buf(spec->thread_batch_buf);
> > > > +  }
> > > > +  spec->thread_batch_buf = buf;
> > > > +}
> > > >
> > > > -  rc = pthread_key_create(thread_specific_key,
> > > > thread_data_destructor);
> > > > +void* cl_get_thread_batch_buf(cl_command_queue queue) {
> > > > +  thread_spec_data* spec = __create_thread_spec_data(queue, 1);
> > > >
> > > > -  if (rc != 0)
> > > > -    return NULL;
> > > > +  assert(spec && spec->thread_magic == thread_magic);
> > > > +
> > > > +  return spec->thread_batch_buf;
> > > > +}
> > > > +
> > > > +void cl_invalid_thread_gpgpu(cl_command_queue queue) {
> > > > +  queue_thread_private *thread_private = ((queue_thread_private
> > > > +*)(queue->thread_data));
> > > > +  thread_spec_data* spec = NULL;
> > > >
> > > > -  return thread_specific_key;
> > > > +  pthread_mutex_lock(&thread_private->thread_data_lock);
> > > > +  spec = thread_private->threads_data[thread_id];
> > > > +  assert(spec);
> > > > +  pthread_mutex_unlock(&thread_private->thread_data_lock);
> > > > +
> > > > +  if (!spec->valid) {
> > > > +    return;
> > > > +  }
> > > > +
> > > > +  assert(spec->gpgpu);
> > > > +  cl_gpgpu_delete(spec->gpgpu);
> > > > +  spec->gpgpu = NULL;
> > > > +  spec->valid = 0;
> > > >  }
> > > >
> > > >  /* The destructor for clean the thread specific data. */ -void
> > > > cl_thread_data_destroy(void * data)
> > > > +void cl_thread_data_destroy(cl_command_queue queue)
> > > >  {
> > > > -  pthread_key_t *thread_specific_key = (pthread_key_t *)data;
> > > > -
> > > > -  /* First release self spec data. */
> > > > -  cl_thread_spec_data* thread_spec_data =
> > > > -         pthread_getspecific(*thread_specific_key);
> > > > -  if (thread_spec_data && thread_spec_data->valid) {
> > > > -    cl_gpgpu_delete(thread_spec_data->gpgpu);
> > > > -    if (thread_spec_data)
> > > > -      cl_free(thread_spec_data);
> > > > +  int i = 0;
> > > > +  queue_thread_private *thread_private = ((queue_thread_private
> > > > + *)(queue->thread_data));  int threads_data_num;
> > > > +  thread_spec_data** threads_data;
> > > > +
> > > > +  pthread_mutex_lock(&thread_private->thread_data_lock);
> > > > +  assert(thread_private->threads_data_num == thread_array_num);
> > > > + threads_data_num = thread_private->threads_data_num;
> > threads_data
> > > > + = thread_private->threads_data;
> > > > + thread_private->threads_data_num = 0;
> > > > + thread_private->threads_data = NULL;
> > > > + pthread_mutex_unlock(&thread_private->thread_data_lock);
> > > > +  cl_free(thread_private);
> > > > +  queue->thread_data = NULL;
> > > > +
> > > > +  for (i = 0; i < threads_data_num; i++) {
> > > > +    if (threads_data[i] != NULL &&
threads_data[i]->thread_batch_buf)
> {
> > > > +      cl_gpgpu_unref_batch_buf(threads_data[i]->thread_batch_buf);
> > > > +      threads_data[i]->thread_batch_buf = NULL;
> > > > +    }
> > > > +
> > > > +    if (threads_data[i] != NULL && threads_data[i]->valid) {
> > > > +      cl_gpgpu_delete(threads_data[i]->gpgpu);
> > > > +      threads_data[i]->gpgpu = NULL;
> > > > +      threads_data[i]->valid = 0;
> > > > +    }
> > > > +    cl_free(threads_data[i]);
> > > >    }
> > > >
> > > > -  pthread_key_delete(*thread_specific_key);
> > > > -  cl_free(thread_specific_key);
> > > > +  cl_free(threads_data);
> > > >  }
> > > > diff --git a/src/cl_thread.h b/src/cl_thread.h index
> > > > c8ab63c..bf855a2 100644
> > > > --- a/src/cl_thread.h
> > > > +++ b/src/cl_thread.h
> > > > @@ -27,7 +27,7 @@
> > > >  void* cl_thread_data_create(void);
> > > >
> > > >  /* The destructor for clean the thread specific data. */ -void
> > > > cl_thread_data_destroy(void * data);
> > > > +void cl_thread_data_destroy(cl_command_queue queue);
> > > >
> > > >  /* Used to get the gpgpu struct of each thread. */  cl_gpgpu
> > > > cl_get_thread_gpgpu(cl_command_queue queue); @@ -36,9 +36,9 @@
> > > > cl_gpgpu cl_get_thread_gpgpu(cl_command_queue queue);  void
> > > > cl_invalid_thread_gpgpu(cl_command_queue queue);
> > > >
> > > >  /* Used to set the batch buffer of each thread. */ -void
> > > > cl_set_thread_batch_buf(void* buf);
> > > > +void cl_set_thread_batch_buf(cl_command_queue queue, void* buf);
> > > >
> > > >  /* Used to get the batch buffer of each thread. */
> > > > -void* cl_get_thread_batch_buf(void);
> > > > +void* cl_get_thread_batch_buf(cl_command_queue queue);
> > > >
> > > >  #endif /* __CL_THREAD_H__ */
> > > > --
> > > > 1.8.3.2
> > > >
> > > > _______________________________________________
> > > > Beignet mailing list
> > > > Beignet at lists.freedesktop.org
> > > > http://lists.freedesktop.org/mailman/listinfo/beignet
> > > _______________________________________________
> > > Beignet mailing list
> > > Beignet at lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/beignet
> >
> >
> >
> > _______________________________________________
> > Beignet mailing list
> > Beignet at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/beignet