[Beignet] [PATCH] Optimization of clEnqueueCopyImageToBuffer for 16 aligned case.
Zhigang Gong
zhigang.gong at linux.intel.com
Thu Feb 12 18:18:05 PST 2015
On Fri, Feb 13, 2015 at 12:37:03AM +0800, Chuanbo Weng wrote:
> We can change the image_channel_order to CL_RGBA and
> image_channel_data_type to CL_UNSIGNED_INT32 for some special
> case, thus 16 bytes can be read by one work item. Bandwidth is
> fully used.
>
> Signed-off-by: Chuanbo Weng <chuanbo.weng at intel.com>
> ---
> src/CMakeLists.txt | 2 +-
> src/cl_context.h | 1 +
> src/cl_mem.c | 44 ++++++++++++++++++----
> .../cl_internal_copy_image_2d_to_buffer_align16.cl | 19 ++++++++++
> 4 files changed, 57 insertions(+), 9 deletions(-)
> create mode 100644 src/kernels/cl_internal_copy_image_2d_to_buffer_align16.cl
>
> diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
> index 939f58d..d4181d8 100644
> --- a/src/CMakeLists.txt
> +++ b/src/CMakeLists.txt
> @@ -49,7 +49,7 @@ cl_internal_copy_image_3d_to_2d cl_internal_copy_image_2d_to_3d cl_internal_copy
> cl_internal_copy_image_2d_to_2d_array cl_internal_copy_image_1d_array_to_1d_array
> cl_internal_copy_image_2d_array_to_2d_array cl_internal_copy_image_2d_array_to_2d
> cl_internal_copy_image_2d_array_to_3d cl_internal_copy_image_3d_to_2d_array
> -cl_internal_copy_image_2d_to_buffer cl_internal_copy_image_3d_to_buffer
> +cl_internal_copy_image_2d_to_buffer cl_internal_copy_image_2d_to_buffer_align16 cl_internal_copy_image_3d_to_buffer
> cl_internal_copy_buffer_to_image_2d cl_internal_copy_buffer_to_image_3d
> cl_internal_fill_buf_align8 cl_internal_fill_buf_align4
> cl_internal_fill_buf_align2 cl_internal_fill_buf_unalign
> diff --git a/src/cl_context.h b/src/cl_context.h
> index 2ea0a73..fdbfd2a 100644
> --- a/src/cl_context.h
> +++ b/src/cl_context.h
> @@ -60,6 +60,7 @@ enum _cl_internal_ker_type {
> CL_ENQUEUE_COPY_IMAGE_2D_ARRAY_TO_3D, //copy image 2d array to image 3d
> CL_ENQUEUE_COPY_IMAGE_3D_TO_2D_ARRAY, //copy image 3d to image 2d array
> CL_ENQUEUE_COPY_IMAGE_2D_TO_BUFFER, //copy image 2d to buffer
> + CL_ENQUEUE_COPY_IMAGE_2D_TO_BUFFER_ALIGN16,
> CL_ENQUEUE_COPY_IMAGE_3D_TO_BUFFER, //copy image 3d tobuffer
> CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D, //copy buffer to image 2d
> CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_3D, //copy buffer to image 3d
> diff --git a/src/cl_mem.c b/src/cl_mem.c
> index e58a183..e9cf539 100644
> --- a/src/cl_mem.c
> +++ b/src/cl_mem.c
> @@ -1714,6 +1714,10 @@ cl_mem_copy_image_to_buffer(cl_command_queue queue, struct _cl_mem_image* image,
> uint32_t intel_fmt, bpp;
> cl_image_format fmt;
> size_t origin0, region0;
> + size_t kn_dst_offset;
> + int align16 = 0;
> + size_t align_size = 1;
> + size_t w_saved;
>
> if(region[1] == 1) local_sz[1] = 1;
> if(region[2] == 1) local_sz[2] = 1;
> @@ -1724,24 +1728,48 @@ cl_mem_copy_image_to_buffer(cl_command_queue queue, struct _cl_mem_image* image,
> /* We use one kernel to copy the data. The kernel is lazily created. */
> assert(image->base.ctx == buffer->ctx);
>
> - fmt.image_channel_order = CL_R;
> - fmt.image_channel_data_type = CL_UNSIGNED_INT8;
> intel_fmt = image->intel_fmt;
> bpp = image->bpp;
> - image->intel_fmt = cl_image_get_intel_format(&fmt);
> - image->w = image->w * image->bpp;
> - image->bpp = 1;
> + w_saved = image->w;
> region0 = region[0] * bpp;
> - origin0 = src_origin[0] * bpp;
> + kn_dst_offset = dst_offset;
> + if( ((image->w * image->bpp) % 16 == 0) && ((src_origin[0] * bpp) % 16 == 0) &&
> + (region0 % 16 == 0) && (dst_offset % 16 == 0) ){
> + fmt.image_channel_order = CL_RGBA;
> + fmt.image_channel_data_type = CL_UNSIGNED_INT32;
> + align16 = 1;
> + align_size = 16;
The above logic will break the IMAGE3D code path, as you haven't made the corresponding
change in the normal kernel cl_internal_copy_image_3d_to_buffer.cl.
More information about the Beignet
mailing list