<div dir="ltr">I tried the CL_MEM_USE_HOST_PTR option from one of the recent Beignet checkins using a simple test case but it failed. Did I misunderstand the usage of CL_MEM_USE_HOST_TR? Same test cased worked OK when using Intel OpenCL SDK 2012 (Windows) for IVB. The test case patch is attached and also cut n pasted here.<div>
<br></div><div>/Ed<br><div><br><div><br></div><div>1) <a href="http://test_zcopy.cl">test_zcopy.cl</a>:</div><div><br></div><div><div>__kernel void</div><div>test_zcopy(__global int *src, __global int *dst, int n, int num_threads)</div>
<div>{</div><div><span class="" style="white-space:pre"> </span>int tid, pos;</div><div><span class="" style="white-space:pre"> </span></div><div><span class="" style="white-space:pre"> </span>tid = get_global_id(0);</div>
<div><span class="" style="white-space:pre"> </span>for (pos=tid; pos < n; pos+=num_threads) {</div><div><span class="" style="white-space:pre"> </span>dst[pos] = src[pos];</div><div><span class="" style="white-space:pre"> </span>}</div>
<div>}</div><div><br></div><div><div><div style>2) compiler_zcopy.cpp:</div></div></div></div><div style><br></div><div style><div>#include "utest_helper.hpp"</div><div>#include <sys/time.h></div><div>#include <malloc.h></div>
<div>#include <string.h></div><div><br></div><div>#define T_GET(t) gettimeofday(&t, NULL);</div><div>#define T_LAPSE(t1, t2) \</div><div> ((t2.tv_sec+t2.tv_usec*0.000001) - (t1.tv_sec+t1.tv_usec*0.000001))</div>
<div>#define ALIGN<span class="" style="white-space:pre"> </span>4096</div><div><br></div><div>static void compiler_zcopy(void)</div><div>{</div><div> const size_t n = 32*1024*1024;</div><div> struct timeval t1, t2;</div>
<div> float t_fin;</div><div> int *in, *out;</div><div> </div><div> in = (int *)memalign(ALIGN, n*sizeof(int));</div><div> out = (int *)memalign(ALIGN, n*sizeof(int));</div><div> OCL_ASSERT(in);</div><div> OCL_ASSERT(out);</div>
<div> for (int i=0; i<n; ++i) in[i] = i;</div><div> bzero(out, n);</div><div> </div><div> // Setup kernel and buffers</div><div> OCL_CREATE_KERNEL("test_zcopy");</div><div> OCL_CREATE_BUFFER(buf[0], CL_MEM_USE_HOST_PTR, n * sizeof(int), in);</div>
<div> OCL_CREATE_BUFFER(buf[1], CL_MEM_USE_HOST_PTR, n * sizeof(int), out);</div><div><br></div><div> // Run the kernel</div><div> locals[0] = 64;</div><div> globals[0] = 32 * locals[0];</div><div> OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]);</div>
<div> OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]);</div><div> OCL_SET_ARG(2, sizeof(int), &n);</div><div> OCL_SET_ARG(3, sizeof(int), &globals[0]);</div><div><br></div><div> T_GET(t1);</div><div> OCL_NDRANGE(1);</div>
<div> OCL_FINISH();</div><div> T_GET(t2);</div><div> t_fin = T_LAPSE(t1, t2);</div><div> printf("test_zcopy: lapse = %f\n",t_fin);</div><div> for (int i=0; i<n; ++i) OCL_ASSERT((out[i]==i));</div><div>}</div>
<div><br></div><div>MAKE_UTEST_FROM_FUNCTION(compiler_zcopy);</div><div><br></div></div></div></div></div>