[Beignet] [ANNOUNCE] Release notes for beignet version 0.2

Zhigang Gong zhigang.gong at linux.intel.com
Fri Jul 5 03:55:19 PDT 2013


Release 0.2 (2013-07-05)
=========================

What's new?
===========

In this release,  we both have master branch and 1.2  branch respectively
targeting OpenCL 1.1 and OpenCL 1.2. The major improvement of this version is
to add many new features and fixs There are a bunch of new features/enhancement/bug
fixes in this release. Here is a short list:

1. Constant buffer support.
2. Implemented the following extensions:
   ICD extension (cl_khr_icd) in opencl-1.2 branch.
   cl_khr_global_int32_base_atomics, cl_khr_global_int32_extended_atomics
   cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
   double float support : cl_khr_fp64
3. Added many builtin functions:
   math(abs_diff/step/hadd/…)/fence/as_xxx/convert_xxx/...
4. Enhanced vector support:
   Use a separate pass to scalarize all most of the instructions.
   Support all vectors type(2,3,4,8,16) for char/short/int/float.
5. Implement more OpenCL APIs and refine the error handling code to comply with spec.
   clEnqueueMapImage/WriteImage/clGetCommandQueueInfo/clGetProgramInfo/...
6. Refine the sampler's implementation to comply with OpenCL 1.1 spec.
7. Fxied branching bugs in write_image and DW multiplication.
8. Fixed bugs when build with SIMD8 mode.
9. Fixed one bug a vector comparison.
10. Fixed the clFinish semantic.
11. And many other fixes.

Where can I get it?
===================

git address: git://anongit.freedesktop.org/beignet
git tag: Release_v0.2
tar ball: http://cgit.freedesktop.org/beignet/snapshot/Release_v0.2.tar.gz
mail list: http://lists.freedesktop.org/mailman/listinfo/beignet

What is it?
===========
Beignet is an open source implementation of the OpenCL specification - a generic
compute oriented API. This code base contains the code to run OpenCL programs on
Intel GPUs which basically defines and implements the OpenCL host functions
required to initialize the device, create the command queues, the kernels and
the programs and run them on the GPU. The code base also contains the compiler
part of the stack which is included in `backend/`. For more specific information
about the compiler, please refer to `backend/README.md`

Visit wiki: http://wiki.freedesktop.org/www/Software/Beignet/  for more information
about the project.

What changed in detail since version 0.1.
=======================
Boqun Feng (1):
      Add CL/*.hpp to installing files

Chuanbo Weng (1):
      Fix compile error when use llvm-3.1 and InstVisitor.h path for llvm-3.3

Dag Lem (16):
      Add samplerless read image functions for 2D and 3D images.
      Updated cl.hpp from http://www.khronos.org/registry/cl/api/1.2/cl.hpp
      Stubs for C++ Bindings
      Save depth and slice_pitch in cl_mem images.
      Implement clEnqueueMapImage.
      Implement clEnqueueReadImage and clEnqueueWriteImage.
      Corrected return of error code in clCreateImage.
      Correct clCreateImage(context, CL_MEM_COPY_HOST_PTR, ...)
      Fix several CL error code return bugs
      utests: Correct box blur
      Correct clEnqueueReadBuffer, clEnqueueWriteBuffer and clEnqueueMapBuffer
      Correct sampler address clamping for read image functions.
      utests: Add test case for box blur on image buffer
      Adaptions for LLVM 3.3 / SPIR
      Check for exhaustion of local memory
      utests: Add test case for global memory barrier

Edward Ching (2):
      utests: Added one test case for clFinish().
      Test case for vector type comparison results.

Homer Hsing (40):
      make raw_fd_ostream not close stdout
      Add clIntelMapBufferGTT, clIntelUnmapBufferGTT, cl_mem_map_gtt and cl_mem_unmap_gtt
      Fix crash when output IR
      Make ceil() work
      New test case of ceil()
      add helper functions in ir::Constant and ir::ConstantSet
      add helper functions in gbe::Program
      disable buggy old code doing global constant
      add special register constoffst expressing curbe offset
      ir::unit can return its constantSet
      Support global constant arrays
      test cases for global constant arrays
      add image3d read/write in stdlib.h
      add backend symbol alias for image3d read write
      add third coord in backend
      enable image3d_t
      test cases for image3d_t
      Fix instruction scheduler ScheduleDAGNode
      Enable 39 math built-in functions
      Test new math built-in functions
      Fix out-of-date math macros
      test case for DW multiplication
      improve disassembling GPU binary code
      update to OpenCL 1.1 header
      Support 64-bit float
      test cases for 64-bit float
      add a lost special register name
      support zero bit counting
      test case for function "clz"
      support build-in function "rotate"
      test case for function "rotate"
      put 64-bit float test cases at tail
      support built-in functions "hadd", "rhadd"
      test cases for "hadd", "rhadd"
      support built-in functions "mul_hi", "mad_hi"
      test cases for built-in functions "mul_hi", "mad_hi"
      support global scalar constants
      test scalar global constants
      support built-in function "upsample"
      test built-in function "upsample"

Junyan He (14):
      Add the INCLUDE_DIRECTORIES directive to all the FindXXX modules.
      add the support of clGetProgramBuildInfo and clGetProgramInfo
      Add one test case for clGetProgramInfo
      Add the support of the API: clGetCommandQueueInfo
      Add the test case for clGetCommandQueueInfo API
      Add the test case for clGetProgramInfo API
      Add the builtin function abs() support
      Add the test case for builtin abs() function
      Modify all the builtin function vect return to (vect_name)(e1, e2, e3)
      Add the vector3 support for builtin abs function
      Add the abs_diff builtin function support
      Add the test case for builtin abs_diff() function
      Add the step builtin function support
      Add the test case for builtin step() function

Lu Guanqun (6):
      remove -ffast-math comiler flag
      remove dollar sign in identifier
      --no-rtti should be changed to -fno-rtti in clang++
      fix error in clang: variable array length is not support for non-POD element
      GenRegInterval should be a struct instead of a class
      GenRegister src[] is not allowed in clang, change this style.

Mario Kicherer (4):
      enable clGetContextInfo with CL_CONTEXT_DEVICES
      CMakeLists.txt enable custom C/CXXFLAGS
      clEnqueueNDRangeKernel: fix for segfault caused by empty local_work_size
      Gracefully handle unsupported systems

Ruiling Song (19):
      Support non-16 multiple group size.
      Add test case for group size
      Enable built-in sub_sat/add_sat vector type.
      Refine error check in clGetPlatformIDs() and clGetPlatformInfo()
      Refine error check in clGetDeviceIDs()
      Refine error check in clCreateContext()
      Refine error check in clCreateCommandQueue()
      Refine error check in clCreateProgramFromSource() and clCreateKernel()
      Refine error check in SetKernelArg() and support NULL buffer argument
      add test case for null kernel arg of global/constant buffer
      Support multi-source CL program
      Fix several typos in unit test.
      GBE: Fix a bug in encoding MATH instruction
      GBE: Add more support of char and short arithmetic
      utests: Add basic arithmetic test case
      Implement clGetContextInfo
      Implement API clGetKernelInfo
      add test case for clGetContextInfo/clGetKernelInfo
      Disable error message output in release version.

Simon Richter (31):
      Fix typo in cl_get_platform_info function name
      Avoid extension names as preprocessor tokens
      "Implement" clGetExtensionFunctionAddress()
      Implement KHR ICD extension
      Accept glibc's implementation of memalign()
      Prefer versioned llvm-config
      Make EGL optional
      Use "clang" command from PATH
      Fix pitch parameter in clCreateImage2D
      Implement clGetMemObjectInfo
      Rename Intel specific extension functions for ICD
      Add cl*Intel to extension function lookup
      Add typedefs for extension functions
      Fix typo
      Add missing select() overloads
      Add missing fmin() and fmax() overloads
      Handle size queries in clGetDeviceInfo better
      Implement clGetDeviceInfo(..., CL_DRIVER_VERSION, ...)
      Define clamp(value, lower, upper)
      Add clGetDeviceInfo(..., CL_BUILT_IN_KERNELS, ...)
      Correct type of device properties
      Update gitignore files
      Make libgbm optional without EGL support
      Add missing include of <limits>
      Add ICD dispatch table in cloned kernels
      Generate all supported as_* functions
      Define all convert_* functions.
      Add long and ulong types to generated functions.
      Enable cl_khr_fp64 extension for OpenCL stdlib header
      Define double vector types
      Enable generation of convert_ and as_ functions for double

Yang Rong (27):
      Add register allocate from tail support for constant buffer.
      Add constant pointer as argument support in kernel.
      Add constant pointer as argument support in runtime.
      Add constant ptr argument test case.
      Rename CBMove to IndirectMove
      Fix a negative number alignment bug in RegisterFilePartitioner allocate.
      Add a scalarize llvm pass.
      Remove useless vector check in GenWriter after scalarize pass.
      Fix a scalarize pass bug.
      Change clang system call to libclang api call.
      Change CMake to support LLVM 3.1.
      Fix int div/rem assert in SIMD8 mode.
      Pass user options to clang.
      Fix a random assert caused by scalarize pass.
      Add mem_fence built-in.
      Fix two tests fail when OCL_SIMD_WIDTH=8.
      Add memory fence before barrier to support global memory barrier.
      utests: Add a new local memory barrier case
      Fix some piglit constant buffer tests fail.
      Add atomic help functions.
      Add all atomic built-in functions.
      Add atomic test case.
      Fix some math function error in simd16.
      Enable int32 atomic and fp64 extensions.
      Fix options parse infinite loop bug.
      Clear atomic dst buffer to fix atomic random fail.
      Fix atomic test failed in GT1.

Yi Sun (2):
      utest: Add test case for build-in function get_work_dim
      Utest: Add a test case for validating built-in function get_global_size()

Zhigang Gong (50):
      write_image: Fixed a bug when use scalar data as color source.
      utests: should set pitch to zero if host_ptr is NULL.
      GBE: fixed the hard coded implementation for sampler/typedwrite.
      GBE: refine the sampler implementation to comply with spec.
      CL: Support kernel side defined samplers.
      utests: Add one test cases for sampler support.
      GBE: remove sampler address space.
      GBE: add scalar register support in loadImmInstruction.
      GBE: concentrate all samplers' allocation at compile time.
      GBE/Runtime: Optimize Sample/TypedWrite instruction.
      GBE: fixed a prediction bug in typed write instruction.
      utests: Fix a bug in movforphi test case.
      CL: Tell the kernel an image bo's tiling mode.
      utests: Refine the fill image0 test case to use map gtt.
      GBE: preare for get_image_xxx functions support.
      GBE: Add support for get_image_width/height.
      CL: complete get_image_width/height support at runtime side.
      utests: add one test case to test get_image_width/height.
      CL: Fixed a get image info bug on 64 bit system.
      Add more get image info functions.
      utests: extent get_image_size cases to other informations..
      GBE: Fixed a bug in byte gather/scatter.
      GBE: fixed a bug on simd8 mode for typed_write instruction.
      GBE: use the simd width environment variable if set.
      Update documents.
      GBE: support load/store of char/short vector.
      utests: test vector load and store.
      GBE: Fixed a hang issue on 64bit system.
      GBE: Fixed a 3 elements vector load/store bug.
      GBE: fixed a predication bug for DW multiplication.
      GBE: Add two builtin functions get_work_dim / get_global_offset.
      utests: change all kernels to unix style text.
      GBE: Fixed a bug in register expieration.
      utests: enable test case for global memory barrier.
      GBE: Fixed GBE: Fix some builtin functions' return value.
      GBE: Fixed one bug in scalarize pass
      utests: Fix a typo.
      GBE: work around the local memory barrier fence issue.
      utests: Enable the local memory barrier test case.
      Docs: Rearrange documents according to wiki page on fd.o.
      Refine the get_local_id/... builtins.
      CL: remove the deprecated function clSetCommandQueueProperty.
      GBE: fixed the bug when sext a i1 to i8/i16/i32.
      GBE: fixed a barrier related bug.
      utests: increase local size in the two barrier test cases.
      CL: destroy the EGL image which is created for gl sharing when delete the mem object.
      GBE: Clear the value map when start a new scalarize pass.
      CLGL: Refine the hack of gbm extension initialization.
      Update beignet docs for release v0.2.
      Bump the version number.

Zou Nan hai (1):
      Pad instruction stream with 8 nops;

Zou, Nanhai (1):
      CL: Fix the bug in clfinish.

-Zhigang
Thanks.


More information about the Beignet mailing list