[Beignet] [ANNOUNCE] Beignet 0.8.0

Wed Feb 12 09:32:59 CET 2014

Beignet 0.8.0 (2014-02-12)
=============================

Beignet version 0.8.0 has been released. As there are many key features
accomplishment and bug fixes included in this version, we decide to jump
the version to 0.8.0 from 0.3.0. The highlighted improvements are as below:

 * Implemented all mandatory features required by OpenCL spec 1.1,
   including the long data type support, half data type support,
   profiling support etc.
 * Improved most of the builtin math functions' precisions.
 * Implemented register spill along with the linear scan register allocation
   algorithm.
 * Support llvm builtin functions: llvm.memset and llvm.memcpy.
 * Fixed the broken the data liveness analysis algorithm.
 * Fixed the barrier hang issue.
 * Work around some sampler/image hardware restrication and make the image
   function fully comply with the spec.
 * Beignet 0.8.0 could get nearly 100% pass rate with piglit ocl test suite,
   and about 99% pass rate with the OpenCV's ocl test suite on Intel HD4000
   platform.

Changes since version 0.3:

Guo Yejun (1):
      GBE: use native exp instruction when enough precision

Homer Hsing (31):
      add scalar type builtin function "dot"
      initialize GenRegister::subphysical
      support converting with rounding mode
      support saturated-rounding converting
      not use "mad" in vector type "dot"
      delete vec-8 or vec-16 typed geometric built-in
      fix built-in function "length"
      fix built-in function "fast_normalize"
      fix built-in function "normalize"
      add same type converting
      fix ill-coded utest_run::main
      fix pointer bugs in linked list
      fix operators for 64 bit integer
      release previous kernel in cl_kernel_init
      release previous program in cl_kernel_init
      fix builtin function 'frexp'
      fix builtin function "copysign"
      fix builtin function "fract"
      improve multithread calling of llvm
      release context in runtime_createcontextfromtype
      fix builtin function "fmax"
      ignore a clang unsupported building option
      fix builtin function "fdim"
      fix builtin function "nextafter"
      fix builtin function "ldexp"
      fix builtin function "ilogb"
      fix ASR operator for 64bit integer
      put a mutex around gbe_program_new_from_llvm
      fix builtin function "isnormal"
      improve builtin function "rint"
      fix builtin function "round"

Igor Gnatenko (2):
      cmake: use libdir macros
      typo: bsically to basically

Jon Nordby (1):
      Make build compatible with Python 2.6

Junyan He (16):
      Add the bo's internal offset support when do drm_intel_bo_emit_reloc
      Implement the clCreateSubBuffer API
      Add the test case for sub buffer check
      Add the clGetMemObjectInfo options for sub-buffer and update the utest case
      Move the gpgpu struct from cl_command_queue to thread specific context
      Fixup the problem of CL_PROGRAM_BINARIES in clGetProgramInfo API
      Add the drm include and lib path for find when drm is not the system one.
      Complete the feature of clGetEventProfilingInfo API
      Disable the PCH valid check to save a lot of compiling time.
      Modify the multi-thread support for queue.
      Fix the multi-thread crash problem of batch buffer release.
      Add -cl-fast-relaxed-math into incompatible opts and fix the PreprocessorOptions bug
      Fix the bug of multi deleting of load instruction in lowering
      Fix the bug in removeLOADIs function.
      Add the device id for haswell GT.
      Fix the problem by kernel file open in utest

Lu Guanqun (1):
      utests: add test case for structure argument

Lv Meng (14):
      GBE: improve precision of exp
      GBE: improve precision of fmod
      GBE: improve precision of expm1
      GBE: improve precision of acosh
      GBE: improve precision of asinh
      GBE: improve precision of sinh
      GBE: improve precision of tanh
      GBE: improve precision of cosh
      GBE: improve precision of remainder
      GBE: improve precision of ldexp
      GBE: improve precision of atanh
      GBE: improve precision of exp10
      GBE: improve precision of hypot
      GBE: improve precision of remquo

Mario Kicherer (2):
      report errors if opening the DRI device fails
      provide meaningful device names through clGetDeviceInfo

Ruiling Song (24):
      GBE: Give a zero-initialized register for Undef value.
      runtime: Fix a dangling pointer issue
      utests: use mad which will get better precision.
      GBE: use ISA mad for mad() builtin function.
      GBE: disable MulAdd pattern in instruction selection temporarily.
      GBE: fix a 64bit scalar register issue.
      GBE: Remove max_limit for struct alignment
      GBE: Fix alignment according to OCL spec
      GBE: Fix alignment for private variables
      GBE: handle half type size
      GBE: Fix null register to integer type
      GBE: Do not change vertical stride when it is 0
      GBE: register width should not exceed execution width
      GBE: improve asin/acos precision
      GBE: improve precision of log/log1p
      GBE: Improve precision of log10
      GBE: Improve precision of log2
      GBE: Fix logb implementation.
      GBE: Remove some noduplicate to let inline works
      GBE: Improve precision of sin/cos/sincos
      GBE: improve precision of tan
      GBE: Improve atan precision
      GBE: Improve precision of atan2
      GBE: Improve precision of cbrt

Simon Richter (1):
      Start looking for LLVM from version 3.3 then higher version.

Yang Rong (40):
      Add preprocessor #define that match the extension name string.
      Remove CL_FP_DENORM in clGetDeviceInfo.
      Re-build the program when build option changed.
      Per openCL spec, set p->is_built to 1 when build fail.
      Fix a event segment fault.
      Refine the build option checking.
      fix the error that structure would be pushed twice
      Add other unsigned interger types mask type of shuffle and shuffle2.
      Add bitcast support between vetor and scalar type.
      Remove boolean values cannot cross their definition basic block restrict.
      Add FCmpInst ord support.
      Fix a compare immediate optimize error.
      Add convert between fp16 and fp32.
      Add vload_half and vstore_half build in.
      Fix some get image info errors.
      Enlarge the global mem size.
      Remove test cl_create_kernel.
      Use -O1 when -cl-opt-disable, for inline function.
      Fix B/UB compare fail.
      When local_work_size is null, try to choose a local_work_size.
      Add FCMP UNO support.
      Refine isnan builtin.
      Fix signed to unsinged type sat convert.
      Fix float to ulong/long fail.
      Fix rtz, rtp, rtn when convert int/uint/long/ulong to float.
      Fix convert long/ulong to float.
      Revert choose local size change when local size is null in clEnqueueNDRang.
      Fix a build pushMap bug.
      Fix some long ops bug.
      Fix a convert typo.
      Fix utest compiler_function_argument3 error after move -O2 to backend.
      Move the llvm optimize pass from clang to backend.
      Move the memory allocate size check to the callee.
      Use OCL_USE_PCH to control the using pch or not.
      Add llvm instrinsic function llvm.memset and llvm.memcpy support.
      Change compiler_function_argument3 to cover llvm.memcpy.
      Add some native functions vector proto.
      Multiple register's hstride in suboffset.
      When local_work_size is null, try to choose a local_work_size.
      Fix build errors in llvm3.5 only system.

Yi Sun (6):
      utest: Add test case for built-in function pow.
      utest: add test case for builtin function exp/exp2/exp10/expm1.
      Add test cases generator.
      Refine calculation for ULP.
      utests/CMakeList.txt: Remove kernel files which generated by utest_generator.py.
      Remove builtin function fma from utest_math_gen.py.

Yongjia Zhang (1):
      Add utest compiler_private_data_overflow

Zhigang Gong (80):
      GBE: fix a bug for the cast(FPToUI) instruction.
      GBE: fix 3-component vector's astype macros.
      Runtime: fixed an incorrect error checking for CL_INVALID_GLOBAL_OFFSET.
      GBE: enable bitselect vector builtin functions.
      GBE: fixed one bug for vector relational builtin functions.
      Add a necessary include path for building with mesa.
      Runtime: fix the incorrect device info string size.
      Runtime: fix some max values.
      Runtime: implement clGetSamplerInfo.
      Runtime: fix the length of properties.
      GBE: Don't modify argument 0 of the get image information instruction.
      Runtime: fix one bug in clGetProgramInfo.
      Runtime: fix some max/alignment values.
      gbe_bin_generator: should not use append option when create new binary.
      Runtime: complete the api clGetKernelWorkGroupInfo.
      GBE: Add support for kernel attribute reqd_work_group_size.
      GBE: remove all vstore macros for constant memory space.
      GBE: fix the constant data allocation.
      Fix a build problem when the llvm version has the fix version digit.
      Runtime: fixed parameter error checking in cl create buffer.
      Runtime: fixed one missing case for clGetKernelWorkGroupInfo.
      Runtime: fix some piglit failures.
      CL/Runtime: workaround the unused sampler_t kernel argument.
      Runtime: implement the get build log function and fix one build error check issue.
      GBE: filter the unsupported cl compile arguments out.
      Runtime: fixed the region check for three rect region related APIs.
      Accelerate utest.
      GBE: fix clang's "incorrect" optimization for barrier call.
      GBE: fix a corner case when allocate registers for local buffer.
      GBE: we should allocate register for ExtractElement insn.
      Defer the scalarize to the last pass before the Gen pass.
      GBE: adjust instruction order for load/function call for vector.
      GBE: rewrite the liveness analysis routine.
      GBE: refine the register expiring handling.
      GBE: fix the potential issue when there are inactive lanes.
      GBE: use soft mask to handle the barrier call.
      GBE: validate active bool value in the branching instruction.
      GBE: optimize the CMP instruction.
      GBE: optimize JMP instruction.
      GBE: clang's FE doesn't support static, we just ignore it.
      GBE: Fix a bug at constant GEP processing.
      GBE: handle the first index of GEP correctly.
      CL: back port ICD support to 1.1 branch.
      CL: prepare to support ICD if the system has ocl-icd..
      GBE: enable relocatable pch files.
      Refine the method to find pch and pcm files.
      GBE: fixed a long related bug.
      Revert faulty pushed patchset
      GBE: fixed a long related bug.
      CL: back port ICD support to 1.1 branch.
      CL: prepare to support ICD if the system has ocl-icd..
      GBE: enable relocatable pch files.
      Refine the method to find pch and pcm files.
      GBE: fixed a register liveness bug for getsamplerinfo instrution.
      GBE/Sampler: Simplfy the sampler handling.
      GBE: move the image allocation to the GEN IR stage.
      GBE: move the image info register allocation to GEN IR stage.
      GBE: fixed the stack allocation.
      GBE: fix the hack code of sampler offset handling.
      GBE: fixed the hacky code of 3D image read/write.
      utests: Put all the generated kernel files to .gitignore at runtime.
      build: work around an old version cmake bug.
      GBE: don't allocate grf for those bools which map to flag.
      GBE: fix some incorrect gen ir output messages.
      GBE: fixed a bug in sample instruction.
      GBE: increase the disassembly output's readability.
      GBE: Implement an extra liveness analysis for the Gen backend.
      GBE: allow the bool registers to be expired.
      GBE: refine register allocation output.
      GBE: prepare to optimize the register spilling policy.
      GBE: Implement complete register spill policy.
      GBE: fixed the out-of-range JMPI.
      Update documents.
      Add clang/LLVM 3.5svn support.
      Silent compilation warning in sampler functions.
      GBE: fixed the unsafe tmpnam_r.
      Update document for LLVM/Clang 3.5.
      Fix the cmake problem in FindLLVM.
      Docs: fix some markdown errors and add some new info.
      Bump to version 0.8.0.

git tag: Release_v0.8

https://01.org/sites/default/files/beignet-0.8.0-source.tar.gz
MD5:    2160c7836e81496781d4843932cc5b21  Beignet-0.8.0-Source.tar.gz
SHA1:   3f58783f757276b0728572db75dc721123341749  Beignet-0.8.0-Source.tar.gz
SHA256: 67212dac1a8a06398421affa0afe19aba4e31b3192a371180eb2c9c5a3977486  Beignet-0.8.0-Source.tar.gz