<html><head><meta http-equiv="content-type" content="text/html; charset=us-ascii"><style>body { line-height: 1.5; }blockquote { margin-top: 0px; margin-bottom: 0px; margin-left: 0.5em; }body { font-size: 10.5pt; font-family: 'Segoe UI'; color: rgb(0, 0, 0); line-height: 1.5; }</style></head><body>
<div><span></span>Very thanks.</div>
<div><br></div><hr style="WIDTH: 210px; HEIGHT: 1px" color="#b5c4df" size="1" align="left">
<div><span><div style="MARGIN: 10px; FONT-FAMILY: verdana; FONT-SIZE: 10pt"><div>yan.wang</div></div></span></div>
<blockquote style="margin-top: 0px; margin-bottom: 0px; margin-left: 0.5em;"><div> </div><div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm"><div style="PADDING-RIGHT: 8px; PADDING-LEFT: 8px; FONT-SIZE: 12px;FONT-FAMILY:tahoma;COLOR:#000000; BACKGROUND: #efefef; PADDING-BOTTOM: 8px; PADDING-TOP: 8px"><div><b>From:</b> <a href="mailto:rong.r.yang@intel.com">Yang, Rong R</a></div><div><b>Date:</b> 2017-06-14 15:36</div><div><b>To:</b> <a href="mailto:yan.wang@linux.intel.com">yan.wang@linux.intel.com</a>; <a href="mailto:beignet@lists.freedesktop.org">beignet@lists.freedesktop.org</a></div><div><b>Subject:</b> Re: [Beignet] [PATCH 2/2] Use aligned16 and aligne4 kernel to copy for large 3D image with TILE_Y.</div></div></div><div><div>LGTM, except some format. I have run git clang-format by manual and pushed, thanks.</div>
<div> </div>
<div>> -----Original Message-----</div>
<div>> From: Beignet [mailto:beignet-bounces@lists.freedesktop.org] On Behalf Of</div>
<div>> yan.wang@linux.intel.com</div>
<div>> Sent: Tuesday, June 13, 2017 16:32</div>
<div>> To: beignet@lists.freedesktop.org</div>
<div>> Cc: Yan Wang <yan.wang@linux.intel.com></div>
<div>> Subject: [Beignet] [PATCH 2/2] Use aligned16 and aligne4 kernel to copy for</div>
<div>> large 3D image with TILE_Y.</div>
<div>> </div>
<div>> From: Yan Wang <yan.wang@linux.intel.com></div>
<div>> </div>
<div>> It is similar with 2D image for avoiding extended image width truncated.</div>
<div>> </div>
<div>> Signed-off-by: Yan Wang <yan.wang@linux.intel.com></div>
<div>> ---</div>
<div>> src/CMakeLists.txt | 2 +</div>
<div>> src/cl_context.h | 4 ++</div>
<div>> src/cl_mem.c | 46 +++++++++++++++++++---</div>
<div>> .../cl_internal_copy_buffer_to_image_3d_align16.cl | 19</div>
<div>> +++++++++ .../cl_internal_copy_buffer_to_image_3d_align4.cl | 19</div>
<div>> +++++++++ .../cl_internal_copy_image_3d_to_buffer_align16.cl | 20</div>
<div>> ++++++++++ .../cl_internal_copy_image_3d_to_buffer_align4.cl | 20</div>
<div>> ++++++++++</div>
<div>> 7 files changed, 125 insertions(+), 5 deletions(-) create mode 100644</div>
<div>> src/kernels/cl_internal_copy_buffer_to_image_3d_align16.cl</div>
<div>> create mode 100644</div>
<div>> src/kernels/cl_internal_copy_buffer_to_image_3d_align4.cl</div>
<div>> create mode 100644</div>
<div>> src/kernels/cl_internal_copy_image_3d_to_buffer_align16.cl</div>
<div>> create mode 100644</div>
<div>> src/kernels/cl_internal_copy_image_3d_to_buffer_align4.cl</div>
<div>> </div>
<div>> diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index 87ad48b..ecb98b9</div>
<div>> 100644</div>
<div>> --- a/src/CMakeLists.txt</div>
<div>> +++ b/src/CMakeLists.txt</div>
<div>> @@ -54,6 +54,8 @@ cl_internal_copy_image_2d_array_to_3d</div>
<div>> cl_internal_copy_image_3d_to_2d_array</div>
<div>> cl_internal_copy_image_2d_to_buffer</div>
<div>> cl_internal_copy_image_2d_to_buffer_align16</div>
<div>> cl_internal_copy_image_3d_to_buffer</div>
<div>> cl_internal_copy_buffer_to_image_2d</div>
<div>> cl_internal_copy_buffer_to_image_2d_align16</div>
<div>> cl_internal_copy_buffer_to_image_3d</div>
<div>> cl_internal_copy_buffer_to_image_2d_align4</div>
<div>> cl_internal_copy_image_2d_to_buffer_align4</div>
<div>> +cl_internal_copy_buffer_to_image_3d_align4</div>
<div>> +cl_internal_copy_image_3d_to_buffer_align4</div>
<div>> +cl_internal_copy_buffer_to_image_3d_align16</div>
<div>> +cl_internal_copy_image_3d_to_buffer_align16</div>
<div>> cl_internal_fill_buf_align8 cl_internal_fill_buf_align4</div>
<div>> cl_internal_fill_buf_align2 cl_internal_fill_buf_unalign</div>
<div>> cl_internal_fill_buf_align128 cl_internal_fill_image_1d diff --git</div>
<div>> a/src/cl_context.h b/src/cl_context.h index 75bf895..b3a79bc 100644</div>
<div>> --- a/src/cl_context.h</div>
<div>> +++ b/src/cl_context.h</div>
<div>> @@ -64,10 +64,14 @@ enum _cl_internal_ker_type {</div>
<div>> CL_ENQUEUE_COPY_IMAGE_2D_TO_BUFFER_ALIGN16,</div>
<div>> CL_ENQUEUE_COPY_IMAGE_2D_TO_BUFFER_ALIGN4,</div>
<div>> CL_ENQUEUE_COPY_IMAGE_3D_TO_BUFFER, //copy image 3d tobuffer</div>
<div>> + CL_ENQUEUE_COPY_IMAGE_3D_TO_BUFFER_ALIGN16,</div>
<div>> + CL_ENQUEUE_COPY_IMAGE_3D_TO_BUFFER_ALIGN4,</div>
<div>> CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D, //copy buffer to image 2d</div>
<div>> CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D_ALIGN16,</div>
<div>> CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D_ALIGN4,</div>
<div>> CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_3D, //copy buffer to image 3d</div>
<div>> + CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_3D_ALIGN16,</div>
<div>> + CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_3D_ALIGN4,</div>
<div>> CL_ENQUEUE_FILL_BUFFER_UNALIGN, //fill buffer with 1 aligne pattern,</div>
<div>> pattern size=1</div>
<div>> CL_ENQUEUE_FILL_BUFFER_ALIGN2, //fill buffer with 2 aligne pattern,</div>
<div>> pattern size=2</div>
<div>> CL_ENQUEUE_FILL_BUFFER_ALIGN4, //fill buffer with 4 aligne pattern,</div>
<div>> pattern size=4</div>
<div>> diff --git a/src/cl_mem.c b/src/cl_mem.c index b6dce3f..307db50 100644</div>
<div>> --- a/src/cl_mem.c</div>
<div>> +++ b/src/cl_mem.c</div>
<div>> @@ -2162,13 +2162,13 @@ get_align_size_for_copy_kernel(struct</div>
<div>> _cl_mem_image* image, const size_t origin0</div>
<div>> const size_t offset, cl_image_format *fmt) {</div>
<div>> size_t align_size = 0;</div>
<div>> </div>
<div>> - if((image->image_type == CL_MEM_OBJECT_IMAGE2D) && ((image->w *</div>
<div>> image->bpp) % ALIGN16 == 0) &&</div>
<div>> + if(((image->w * image->bpp) % ALIGN16 == 0) &&</div>
<div>> ((origin0 * image->bpp) % ALIGN16 == 0) && (region0 % ALIGN16 == 0)</div>
<div>> && (offset % ALIGN16 == 0)){</div>
<div>> fmt->image_channel_order = CL_RGBA;</div>
<div>> fmt->image_channel_data_type = CL_UNSIGNED_INT32;</div>
<div>> align_size = ALIGN16;</div>
<div>> }</div>
<div>> - else if((image->image_type == CL_MEM_OBJECT_IMAGE2D) && ((image-</div>
<div>> >w * image->bpp) % ALIGN4 == 0) &&</div>
<div>> + else if(((image->w * image->bpp) % ALIGN4 == 0) &&</div>
<div>> ((origin0 * image->bpp) % ALIGN4 == 0) && (region0 % ALIGN4 == 0) &&</div>
<div>> (offset % ALIGN4 == 0)){</div>
<div>> fmt->image_channel_order = CL_R;</div>
<div>> fmt->image_channel_data_type = CL_UNSIGNED_INT32; @@ -2247,11</div>
<div>> +2247,29 @@ cl_mem_copy_image_to_buffer(cl_command_queue queue,</div>
<div>> cl_event event, struct _cl_m</div>
<div>> cl_internal_copy_image_2d_to_buffer_str,</div>
<div>> (size_t)cl_internal_copy_image_2d_to_buffer_str_size, NULL);</div>
<div>> }</div>
<div>> }else if(image->image_type == CL_MEM_OBJECT_IMAGE3D) {</div>
<div>> - extern char cl_internal_copy_image_3d_to_buffer_str[];</div>
<div>> - extern size_t cl_internal_copy_image_3d_to_buffer_str_size;</div>
<div>> + if(align_size == ALIGN16){</div>
<div>> + extern char cl_internal_copy_image_3d_to_buffer_align16_str[];</div>
<div>> + extern size_t</div>
<div>> + cl_internal_copy_image_3d_to_buffer_align16_str_size;</div>
<div>> +</div>
<div>> + ker = cl_context_get_static_kernel_from_bin(queue->ctx,</div>
<div>> CL_ENQUEUE_COPY_IMAGE_3D_TO_BUFFER_ALIGN16,</div>
<div>> + cl_internal_copy_image_3d_to_buffer_align16_str,</div>
<div>> + (size_t)cl_internal_copy_image_3d_to_buffer_align16_str_size,</div>
<div>> NULL);</div>
<div>> + }</div>
<div>> + else if(align_size == ALIGN4){</div>
<div>> + extern char cl_internal_copy_image_3d_to_buffer_align4_str[];</div>
<div>> + extern size_t</div>
<div>> + cl_internal_copy_image_3d_to_buffer_align4_str_size;</div>
<div>> </div>
<div>> - ker = cl_context_get_static_kernel_from_bin(queue->ctx,</div>
<div>> CL_ENQUEUE_COPY_IMAGE_3D_TO_BUFFER,</div>
<div>> + ker = cl_context_get_static_kernel_from_bin(queue->ctx,</div>
<div>> CL_ENQUEUE_COPY_IMAGE_3D_TO_BUFFER_ALIGN4,</div>
<div>> + cl_internal_copy_image_3d_to_buffer_align4_str,</div>
<div>> + (size_t)cl_internal_copy_image_3d_to_buffer_align4_str_size,</div>
<div>> NULL);</div>
<div>> + }</div>
<div>> + else{</div>
<div>> + extern char cl_internal_copy_image_3d_to_buffer_str[];</div>
<div>> + extern size_t cl_internal_copy_image_3d_to_buffer_str_size;</div>
<div>> +</div>
<div>> + ker = cl_context_get_static_kernel_from_bin(queue->ctx,</div>
<div>> + CL_ENQUEUE_COPY_IMAGE_3D_TO_BUFFER,</div>
<div>> cl_internal_copy_image_3d_to_buffer_str,</div>
<div>> (size_t)cl_internal_copy_image_3d_to_buffer_str_size, NULL);</div>
<div>> + }</div>
<div>> }</div>
<div>> </div>
<div>> if (!ker) {</div>
<div>> @@ -2347,11 +2365,29 @@</div>
<div>> cl_mem_copy_buffer_to_image(cl_command_queue queue, cl_event</div>
<div>> event, cl_mem buffe</div>
<div>> cl_internal_copy_buffer_to_image_2d_str,</div>
<div>> (size_t)cl_internal_copy_buffer_to_image_2d_str_size, NULL);</div>
<div>> }</div>
<div>> }else if(image->image_type == CL_MEM_OBJECT_IMAGE3D) {</div>
<div>> + if(align_size == ALIGN16){</div>
<div>> + extern char cl_internal_copy_buffer_to_image_3d_align16_str[];</div>
<div>> + extern size_t</div>
<div>> + cl_internal_copy_buffer_to_image_3d_align16_str_size;</div>
<div>> +</div>
<div>> + ker = cl_context_get_static_kernel_from_bin(queue->ctx,</div>
<div>> CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_3D_ALIGN16,</div>
<div>> + cl_internal_copy_buffer_to_image_3d_align16_str,</div>
<div>> + (size_t)cl_internal_copy_buffer_to_image_3d_align16_str_size,</div>
<div>> NULL);</div>
<div>> + }</div>
<div>> + else if(align_size == ALIGN4){</div>
<div>> + extern char cl_internal_copy_buffer_to_image_3d_align4_str[];</div>
<div>> + extern size_t</div>
<div>> + cl_internal_copy_buffer_to_image_3d_align4_str_size;</div>
<div>> +</div>
<div>> + ker = cl_context_get_static_kernel_from_bin(queue->ctx,</div>
<div>> CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_3D_ALIGN4,</div>
<div>> + cl_internal_copy_buffer_to_image_3d_align4_str,</div>
<div>> + (size_t)cl_internal_copy_buffer_to_image_3d_align4_str_size,</div>
<div>> NULL);</div>
<div>> + }</div>
<div>> + else{</div>
<div>> extern char cl_internal_copy_buffer_to_image_3d_str[];</div>
<div>> extern size_t cl_internal_copy_buffer_to_image_3d_str_size;</div>
<div>> </div>
<div>> ker = cl_context_get_static_kernel_from_bin(queue->ctx,</div>
<div>> CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_3D,</div>
<div>> cl_internal_copy_buffer_to_image_3d_str,</div>
<div>> (size_t)cl_internal_copy_buffer_to_image_3d_str_size, NULL);</div>
<div>> + }</div>
<div>> }</div>
<div>> if (!ker)</div>
<div>> return CL_OUT_OF_RESOURCES;</div>
<div>> diff --git a/src/kernels/cl_internal_copy_buffer_to_image_3d_align16.cl</div>
<div>> b/src/kernels/cl_internal_copy_buffer_to_image_3d_align16.cl</div>
<div>> new file mode 100644</div>
<div>> index 0000000..32f1f63</div>
<div>> --- /dev/null</div>
<div>> +++ b/src/kernels/cl_internal_copy_buffer_to_image_3d_align16.cl</div>
<div>> @@ -0,0 +1,19 @@</div>
<div>> +kernel void __cl_copy_buffer_to_image_3d_align16(__write_only</div>
<div>> image3d_t image, global uint4* buffer,</div>
<div>> + unsigned int region0, unsigned int region1, unsigned int</div>
<div>> region2,</div>
<div>> + unsigned int dst_origin0, unsigned int dst_origin1,</div>
<div>> unsigned int dst_origin2,</div>
<div>> + unsigned int src_offset) {</div>
<div>> + int i = get_global_id(0);</div>
<div>> + int j = get_global_id(1);</div>
<div>> + int k = get_global_id(2);</div>
<div>> + uint4 color = (uint4)(0);</div>
<div>> + int4 dst_coord;</div>
<div>> + if((i >= region0) || (j>= region1) || (k>=region2))</div>
<div>> + return;</div>
<div>> + dst_coord.x = dst_origin0 + i;</div>
<div>> + dst_coord.y = dst_origin1 + j;</div>
<div>> + dst_coord.z = dst_origin2 + k;</div>
<div>> + src_offset += (k * region1 + j) * region0 + i;</div>
<div>> + color = buffer[src_offset];</div>
<div>> + write_imageui(image, dst_coord, color); }</div>
<div>> diff --git a/src/kernels/cl_internal_copy_buffer_to_image_3d_align4.cl</div>
<div>> b/src/kernels/cl_internal_copy_buffer_to_image_3d_align4.cl</div>
<div>> new file mode 100644</div>
<div>> index 0000000..2ccbcf1</div>
<div>> --- /dev/null</div>
<div>> +++ b/src/kernels/cl_internal_copy_buffer_to_image_3d_align4.cl</div>
<div>> @@ -0,0 +1,19 @@</div>
<div>> +kernel void __cl_copy_buffer_to_image_3d_align4(__write_only</div>
<div>> image3d_t image, global uint* buffer,</div>
<div>> + unsigned int region0, unsigned int region1, unsigned int</div>
<div>> region2,</div>
<div>> + unsigned int dst_origin0, unsigned int dst_origin1,</div>
<div>> unsigned int dst_origin2,</div>
<div>> + unsigned int src_offset) {</div>
<div>> + int i = get_global_id(0);</div>
<div>> + int j = get_global_id(1);</div>
<div>> + int k = get_global_id(2);</div>
<div>> + uint4 color = (uint4)(0);</div>
<div>> + int4 dst_coord;</div>
<div>> + if((i >= region0) || (j>= region1) || (k>=region2))</div>
<div>> + return;</div>
<div>> + dst_coord.x = dst_origin0 + i;</div>
<div>> + dst_coord.y = dst_origin1 + j;</div>
<div>> + dst_coord.z = dst_origin2 + k;</div>
<div>> + src_offset += (k * region1 + j) * region0 + i;</div>
<div>> + color.x = buffer[src_offset];</div>
<div>> + write_imageui(image, dst_coord, color); }</div>
<div>> diff --git a/src/kernels/cl_internal_copy_image_3d_to_buffer_align16.cl</div>
<div>> b/src/kernels/cl_internal_copy_image_3d_to_buffer_align16.cl</div>
<div>> new file mode 100644</div>
<div>> index 0000000..e116d47</div>
<div>> --- /dev/null</div>
<div>> +++ b/src/kernels/cl_internal_copy_image_3d_to_buffer_align16.cl</div>
<div>> @@ -0,0 +1,20 @@</div>
<div>> +kernel void __cl_copy_image_3d_to_buffer_align16 ( __read_only</div>
<div>> image3d_t image, global uint4* buffer,</div>
<div>> + unsigned int region0, unsigned int region1, unsigned int</div>
<div>> region2,</div>
<div>> + unsigned int src_origin0, unsigned int src_origin1,</div>
<div>> unsigned int src_origin2,</div>
<div>> + unsigned int dst_offset) {</div>
<div>> + int i = get_global_id(0);</div>
<div>> + int j = get_global_id(1);</div>
<div>> + int k = get_global_id(2);</div>
<div>> + uint4 color;</div>
<div>> + const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE |</div>
<div>> +CLK_ADDRESS_NONE | CLK_FILTER_NEAREST;</div>
<div>> + int4 src_coord;</div>
<div>> + if((i >= region0) || (j>= region1) || (k>=region2))</div>
<div>> + return;</div>
<div>> + src_coord.x = src_origin0 + i;</div>
<div>> + src_coord.y = src_origin1 + j;</div>
<div>> + src_coord.z = src_origin2 + k;</div>
<div>> + color = read_imageui(image, sampler, src_coord);</div>
<div>> + dst_offset += (k * region1 + j) * region0 + i;</div>
<div>> + *(buffer + dst_offset) = color;</div>
<div>> +}</div>
<div>> diff --git a/src/kernels/cl_internal_copy_image_3d_to_buffer_align4.cl</div>
<div>> b/src/kernels/cl_internal_copy_image_3d_to_buffer_align4.cl</div>
<div>> new file mode 100644</div>
<div>> index 0000000..d5374c4</div>
<div>> --- /dev/null</div>
<div>> +++ b/src/kernels/cl_internal_copy_image_3d_to_buffer_align4.cl</div>
<div>> @@ -0,0 +1,20 @@</div>
<div>> +kernel void __cl_copy_image_3d_to_buffer_align4 ( __read_only</div>
<div>> image3d_t image, global uint* buffer,</div>
<div>> + unsigned int region0, unsigned int region1, unsigned int</div>
<div>> region2,</div>
<div>> + unsigned int src_origin0, unsigned int src_origin1,</div>
<div>> unsigned int src_origin2,</div>
<div>> + unsigned int dst_offset) {</div>
<div>> + int i = get_global_id(0);</div>
<div>> + int j = get_global_id(1);</div>
<div>> + int k = get_global_id(2);</div>
<div>> + uint4 color;</div>
<div>> + const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE |</div>
<div>> +CLK_ADDRESS_NONE | CLK_FILTER_NEAREST;</div>
<div>> + int4 src_coord;</div>
<div>> + if((i >= region0) || (j>= region1) || (k>=region2))</div>
<div>> + return;</div>
<div>> + src_coord.x = src_origin0 + i;</div>
<div>> + src_coord.y = src_origin1 + j;</div>
<div>> + src_coord.z = src_origin2 + k;</div>
<div>> + color = read_imageui(image, sampler, src_coord);</div>
<div>> + dst_offset += (k * region1 + j) * region0 + i;</div>
<div>> + buffer[dst_offset] = color.x;</div>
<div>> +}</div>
<div>> --</div>
<div>> 2.7.4</div>
<div>> </div>
<div>> _______________________________________________</div>
<div>> Beignet mailing list</div>
<div>> Beignet@lists.freedesktop.org</div>
<div>> https://lists.freedesktop.org/mailman/listinfo/beignet</div>
<div>_______________________________________________</div>
<div>Beignet mailing list</div>
<div>Beignet@lists.freedesktop.org</div>
<div>https://lists.freedesktop.org/mailman/listinfo/beignet</div>
</div></blockquote>
</body></html>