[Beignet] [ANNOUNCE] Release notes for beignet version 0.2
Zhigang Gong
zhigang.gong at linux.intel.com
Fri Jul 5 03:55:19 PDT 2013
Release 0.2 (2013-07-05)
=========================
What's new?
===========
In this release, we both have master branch and 1.2 branch respectively
targeting OpenCL 1.1 and OpenCL 1.2. The major improvement of this version is
to add many new features and fixs There are a bunch of new features/enhancement/bug
fixes in this release. Here is a short list:
1. Constant buffer support.
2. Implemented the following extensions:
ICD extension (cl_khr_icd) in opencl-1.2 branch.
cl_khr_global_int32_base_atomics, cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
double float support : cl_khr_fp64
3. Added many builtin functions:
math(abs_diff/step/hadd/…)/fence/as_xxx/convert_xxx/...
4. Enhanced vector support:
Use a separate pass to scalarize all most of the instructions.
Support all vectors type(2,3,4,8,16) for char/short/int/float.
5. Implement more OpenCL APIs and refine the error handling code to comply with spec.
clEnqueueMapImage/WriteImage/clGetCommandQueueInfo/clGetProgramInfo/...
6. Refine the sampler's implementation to comply with OpenCL 1.1 spec.
7. Fxied branching bugs in write_image and DW multiplication.
8. Fixed bugs when build with SIMD8 mode.
9. Fixed one bug a vector comparison.
10. Fixed the clFinish semantic.
11. And many other fixes.
Where can I get it?
===================
git address: git://anongit.freedesktop.org/beignet
git tag: Release_v0.2
tar ball: http://cgit.freedesktop.org/beignet/snapshot/Release_v0.2.tar.gz
mail list: http://lists.freedesktop.org/mailman/listinfo/beignet
What is it?
===========
Beignet is an open source implementation of the OpenCL specification - a generic
compute oriented API. This code base contains the code to run OpenCL programs on
Intel GPUs which basically defines and implements the OpenCL host functions
required to initialize the device, create the command queues, the kernels and
the programs and run them on the GPU. The code base also contains the compiler
part of the stack which is included in `backend/`. For more specific information
about the compiler, please refer to `backend/README.md`
Visit wiki: http://wiki.freedesktop.org/www/Software/Beignet/ for more information
about the project.
What changed in detail since version 0.1.
=======================
Boqun Feng (1):
Add CL/*.hpp to installing files
Chuanbo Weng (1):
Fix compile error when use llvm-3.1 and InstVisitor.h path for llvm-3.3
Dag Lem (16):
Add samplerless read image functions for 2D and 3D images.
Updated cl.hpp from http://www.khronos.org/registry/cl/api/1.2/cl.hpp
Stubs for C++ Bindings
Save depth and slice_pitch in cl_mem images.
Implement clEnqueueMapImage.
Implement clEnqueueReadImage and clEnqueueWriteImage.
Corrected return of error code in clCreateImage.
Correct clCreateImage(context, CL_MEM_COPY_HOST_PTR, ...)
Fix several CL error code return bugs
utests: Correct box blur
Correct clEnqueueReadBuffer, clEnqueueWriteBuffer and clEnqueueMapBuffer
Correct sampler address clamping for read image functions.
utests: Add test case for box blur on image buffer
Adaptions for LLVM 3.3 / SPIR
Check for exhaustion of local memory
utests: Add test case for global memory barrier
Edward Ching (2):
utests: Added one test case for clFinish().
Test case for vector type comparison results.
Homer Hsing (40):
make raw_fd_ostream not close stdout
Add clIntelMapBufferGTT, clIntelUnmapBufferGTT, cl_mem_map_gtt and cl_mem_unmap_gtt
Fix crash when output IR
Make ceil() work
New test case of ceil()
add helper functions in ir::Constant and ir::ConstantSet
add helper functions in gbe::Program
disable buggy old code doing global constant
add special register constoffst expressing curbe offset
ir::unit can return its constantSet
Support global constant arrays
test cases for global constant arrays
add image3d read/write in stdlib.h
add backend symbol alias for image3d read write
add third coord in backend
enable image3d_t
test cases for image3d_t
Fix instruction scheduler ScheduleDAGNode
Enable 39 math built-in functions
Test new math built-in functions
Fix out-of-date math macros
test case for DW multiplication
improve disassembling GPU binary code
update to OpenCL 1.1 header
Support 64-bit float
test cases for 64-bit float
add a lost special register name
support zero bit counting
test case for function "clz"
support build-in function "rotate"
test case for function "rotate"
put 64-bit float test cases at tail
support built-in functions "hadd", "rhadd"
test cases for "hadd", "rhadd"
support built-in functions "mul_hi", "mad_hi"
test cases for built-in functions "mul_hi", "mad_hi"
support global scalar constants
test scalar global constants
support built-in function "upsample"
test built-in function "upsample"
Junyan He (14):
Add the INCLUDE_DIRECTORIES directive to all the FindXXX modules.
add the support of clGetProgramBuildInfo and clGetProgramInfo
Add one test case for clGetProgramInfo
Add the support of the API: clGetCommandQueueInfo
Add the test case for clGetCommandQueueInfo API
Add the test case for clGetProgramInfo API
Add the builtin function abs() support
Add the test case for builtin abs() function
Modify all the builtin function vect return to (vect_name)(e1, e2, e3)
Add the vector3 support for builtin abs function
Add the abs_diff builtin function support
Add the test case for builtin abs_diff() function
Add the step builtin function support
Add the test case for builtin step() function
Lu Guanqun (6):
remove -ffast-math comiler flag
remove dollar sign in identifier
--no-rtti should be changed to -fno-rtti in clang++
fix error in clang: variable array length is not support for non-POD element
GenRegInterval should be a struct instead of a class
GenRegister src[] is not allowed in clang, change this style.
Mario Kicherer (4):
enable clGetContextInfo with CL_CONTEXT_DEVICES
CMakeLists.txt enable custom C/CXXFLAGS
clEnqueueNDRangeKernel: fix for segfault caused by empty local_work_size
Gracefully handle unsupported systems
Ruiling Song (19):
Support non-16 multiple group size.
Add test case for group size
Enable built-in sub_sat/add_sat vector type.
Refine error check in clGetPlatformIDs() and clGetPlatformInfo()
Refine error check in clGetDeviceIDs()
Refine error check in clCreateContext()
Refine error check in clCreateCommandQueue()
Refine error check in clCreateProgramFromSource() and clCreateKernel()
Refine error check in SetKernelArg() and support NULL buffer argument
add test case for null kernel arg of global/constant buffer
Support multi-source CL program
Fix several typos in unit test.
GBE: Fix a bug in encoding MATH instruction
GBE: Add more support of char and short arithmetic
utests: Add basic arithmetic test case
Implement clGetContextInfo
Implement API clGetKernelInfo
add test case for clGetContextInfo/clGetKernelInfo
Disable error message output in release version.
Simon Richter (31):
Fix typo in cl_get_platform_info function name
Avoid extension names as preprocessor tokens
"Implement" clGetExtensionFunctionAddress()
Implement KHR ICD extension
Accept glibc's implementation of memalign()
Prefer versioned llvm-config
Make EGL optional
Use "clang" command from PATH
Fix pitch parameter in clCreateImage2D
Implement clGetMemObjectInfo
Rename Intel specific extension functions for ICD
Add cl*Intel to extension function lookup
Add typedefs for extension functions
Fix typo
Add missing select() overloads
Add missing fmin() and fmax() overloads
Handle size queries in clGetDeviceInfo better
Implement clGetDeviceInfo(..., CL_DRIVER_VERSION, ...)
Define clamp(value, lower, upper)
Add clGetDeviceInfo(..., CL_BUILT_IN_KERNELS, ...)
Correct type of device properties
Update gitignore files
Make libgbm optional without EGL support
Add missing include of <limits>
Add ICD dispatch table in cloned kernels
Generate all supported as_* functions
Define all convert_* functions.
Add long and ulong types to generated functions.
Enable cl_khr_fp64 extension for OpenCL stdlib header
Define double vector types
Enable generation of convert_ and as_ functions for double
Yang Rong (27):
Add register allocate from tail support for constant buffer.
Add constant pointer as argument support in kernel.
Add constant pointer as argument support in runtime.
Add constant ptr argument test case.
Rename CBMove to IndirectMove
Fix a negative number alignment bug in RegisterFilePartitioner allocate.
Add a scalarize llvm pass.
Remove useless vector check in GenWriter after scalarize pass.
Fix a scalarize pass bug.
Change clang system call to libclang api call.
Change CMake to support LLVM 3.1.
Fix int div/rem assert in SIMD8 mode.
Pass user options to clang.
Fix a random assert caused by scalarize pass.
Add mem_fence built-in.
Fix two tests fail when OCL_SIMD_WIDTH=8.
Add memory fence before barrier to support global memory barrier.
utests: Add a new local memory barrier case
Fix some piglit constant buffer tests fail.
Add atomic help functions.
Add all atomic built-in functions.
Add atomic test case.
Fix some math function error in simd16.
Enable int32 atomic and fp64 extensions.
Fix options parse infinite loop bug.
Clear atomic dst buffer to fix atomic random fail.
Fix atomic test failed in GT1.
Yi Sun (2):
utest: Add test case for build-in function get_work_dim
Utest: Add a test case for validating built-in function get_global_size()
Zhigang Gong (50):
write_image: Fixed a bug when use scalar data as color source.
utests: should set pitch to zero if host_ptr is NULL.
GBE: fixed the hard coded implementation for sampler/typedwrite.
GBE: refine the sampler implementation to comply with spec.
CL: Support kernel side defined samplers.
utests: Add one test cases for sampler support.
GBE: remove sampler address space.
GBE: add scalar register support in loadImmInstruction.
GBE: concentrate all samplers' allocation at compile time.
GBE/Runtime: Optimize Sample/TypedWrite instruction.
GBE: fixed a prediction bug in typed write instruction.
utests: Fix a bug in movforphi test case.
CL: Tell the kernel an image bo's tiling mode.
utests: Refine the fill image0 test case to use map gtt.
GBE: preare for get_image_xxx functions support.
GBE: Add support for get_image_width/height.
CL: complete get_image_width/height support at runtime side.
utests: add one test case to test get_image_width/height.
CL: Fixed a get image info bug on 64 bit system.
Add more get image info functions.
utests: extent get_image_size cases to other informations..
GBE: Fixed a bug in byte gather/scatter.
GBE: fixed a bug on simd8 mode for typed_write instruction.
GBE: use the simd width environment variable if set.
Update documents.
GBE: support load/store of char/short vector.
utests: test vector load and store.
GBE: Fixed a hang issue on 64bit system.
GBE: Fixed a 3 elements vector load/store bug.
GBE: fixed a predication bug for DW multiplication.
GBE: Add two builtin functions get_work_dim / get_global_offset.
utests: change all kernels to unix style text.
GBE: Fixed a bug in register expieration.
utests: enable test case for global memory barrier.
GBE: Fixed GBE: Fix some builtin functions' return value.
GBE: Fixed one bug in scalarize pass
utests: Fix a typo.
GBE: work around the local memory barrier fence issue.
utests: Enable the local memory barrier test case.
Docs: Rearrange documents according to wiki page on fd.o.
Refine the get_local_id/... builtins.
CL: remove the deprecated function clSetCommandQueueProperty.
GBE: fixed the bug when sext a i1 to i8/i16/i32.
GBE: fixed a barrier related bug.
utests: increase local size in the two barrier test cases.
CL: destroy the EGL image which is created for gl sharing when delete the mem object.
GBE: Clear the value map when start a new scalarize pass.
CLGL: Refine the hack of gbm extension initialization.
Update beignet docs for release v0.2.
Bump the version number.
Zou Nan hai (1):
Pad instruction stream with 8 nops;
Zou, Nanhai (1):
CL: Fix the bug in clfinish.
-Zhigang
Thanks.
More information about the Beignet
mailing list