[Beignet] [ANNOUNCE] Beignet 1.1.0 (2015-07-31)

Thu Jul 30 23:58:03 PDT 2015

Beignet 1.1.0 (2015-07-31)

========================

Beignet development team is pleased to announce that Beignet version 1.1.0 has been released. The is another milestone after version 1.0.0. In this release, Beignet got Khronos OpenCL 1.2 conformance certification on BDW. Meanwhile, Beignet supports more and more platforms, include 3rd to 6th Generation Intel Core Processors and some Intel Atom Processors. This release also add some new features, such as cl_intel_subgroups and cl_khr_spir extensions and half data type support. This release continue to improve the quality and performance, some OpenCV 3.0 benchmarks has 3x ~ 4x performance gain.

The highlighted improvements are as below:

1. Added 6th generation Intel Core Processors (SKL) support.
2. Added 5th generation Intel Atom Processors (BSW) support.
3. Re-implement structurized control flow algorithm.
4. Added half data type support after BDW.
5. Implement a new BTI solution to support dynamic bti.
6. Added extension cl_intel_subgroups support.
7. Replace some built-ins with llvm intrinsic.
8. Add Indirect structure argument read support.
9. Enable cl_khr_spir extension to build and run from SPIR binary.
10. Improvement some opencv test suite performance on all platforms.
11. Several other bug fixes since last release.

Git tag: Release_v1.1.0
Gitweb URL: http://cgit.freedesktop.org/beignet
https://01.org/sites/default/files/beignet-1.1.0-source.tar.gz

md5sum: a7a9276a3e635266274240e8fc8e4490  beignet-1.1.0-source.tar.gz
sha1sum: 67f58f3f1df9cd970998342b0df619e41c80b20b  beignet-1.1.0-source.tar.gz
sha256sum: 4afe09ea13cd7f8475b9f6534e97cb4dcd307c602095c6968b0ed22290753386  beignet-1.1.0-source.tar.gz

-----------------------------------------------------------------

Changes since 1.0.0:

Andreas Beckmann (2):
      prefer newer llvm versions over 3.3
      remove unsafe define -D__$(USER)__

Brian Kloppenborg (1):
      BUGFIX: Prohibit 'make package' from doing system install of ICD vendor file

Chuanbo Weng (11):
      Change CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR from 8 to 16.
      Refine benchmark output.
      Refine copy_buf benchmark and rename the file.
      Implement 1D/2D image array related cl_mem_kernel_copy_image in cl way instead of cpu way.
      Add example to show libva buffer sharing with extension clCreateImageFromLibvaIntel.
      Add document to describe the detials of libva buffer sharing.
      Add benchmark of clEnqueueCopyImageToBuffer(copy 2d image to buffer).
      Optimization of clEnqueueCopyImageToBuffer for 16 aligned case.
      Fix error in CMakeLists.txt of examples.
      Add example to show v4l2 buffer sharing with extension clGetMemObjectFdIntel.
      Add document to describe the detials of v4l2 buffer sharing.

David Couturier (1):
      Fix: (v3) Event callback that were not executed when command was already CL_COMPLETE + thread safety for callbacks

Guo Yejun (37):
      re-enable userptr with fix: CPU access after GPU finishes the rendering
      fix issue to create cl image from libva with non-zero offset
      add test for clCreateImageFromLibvaIntel
      fix issue to pass utest of runtime_climage_from_boname for BDW
      clean code, the logic is already at the beginning of function
      add test of cl_mem_use_host_ptr into benchmark
      refine utest of cl_mem_use_host_ptr
      enable CL_MEM_ALLOC_HOST_PTR with user_ptr to avoid copy between GPU/CPU
      add utest of CL_MEM_ALLOC_HOST_PTR
      replace hash_map with map
      do not include llvm/clang headers for libgbeinterp
      change Immediate::operator= from private to public
      do not use C++11 features inside libgbeinterp
      fix utest build for some old gcc version
      refine gbe_bin_generater usage to add -t option
      remove useless dependency libocl
      add option BUILD_STANDALONE_GBE_COMPILER to build static compiler
      add CMake option USE_STANDALONE_GBE_COMPILER and STANDALONE_GBE_COMPILER_DIR
      only build tests that do not need compiler when standalone compiler is provided
      add howto for old gcc version
      correct the cache line size to be 64
      loose the alignment limitation for host_ptr of CL_MEM_USE_HOST_PTR
      update utest to loose userptr limitation
      correct env var to output llvm IR
      add simd level function __gen_ocl_get_simd_size
      add utest for __gen_ocl_get_simd_size
      add simd level function __gen_ocl_get_simd_id
      add utest for __gen_ocl_get_simd_id
      add introduction to build Beignet with yocto
      rename __gen_ocl_get_simd_id/size to get_sub_group_id/size
      add sub group functions intel_sub_group_shuffle
      rename __gen_ocl_simd_any/all to sub_group_any/all
      add utest for intel_sub_group_shuffle
      correct the src output of alu3 when OCL_OUTPUT_ASM=1
      enable CL_RG + CL_UNORM_INT8 for image
      enable CL_UNSIGNED_INT8 for CL_RG to fix regression
      Use a separate pattern for simd shuffle instead of binary pattern

Jeff McGee (2):
      Add driver callback for updating device info
      Query the driver directly for compute units and subslice

Junyan He (82):
      Fix the printf buffer size bug.
      Import the native long type of ul1 ul8 and ul16
      Add long type support for disasm.
      Add long imm value in gen8 instruction.
      Add unpacked ud and unpacked uw for long type.
      Add the long unpacked ud uw into the instruction schedule consideration
      Modify the split logic in encoder
      Add the u64 imm type in register
      Add long support flag into gen selection
      Modify the load IMM 64 function.
      Disasm supports to print long imm value in instruction.
      Add functions for conversion between native and fake long.
      Overload the READ64 and WRITE64 function for Gen8
      Add the canHandleLong virtual function into gen encoder
      Overload all the alu1 and alu1withtemp functions.
      Overload all the simple binary functions.
      Modify the convert logic in gen selection.
      Modify the fake dst register for CMP in register allocation.
      Overload I64MUL function.
      Overload the i64MULHI function.
      Overload the I64HADD function.
      Overload I64RHADD function.
      Overload the I64MADSAT function.
      Overlaod I64 Div and Rem function.
      Modify the cmp and sel for I64 and cleanup all virtual functions.
      Add test case for long mul_sat and mul_hi
      Add test case for i64 div and rem.
      Add long NOT test case.
      Add PackLong and UnpackLong functions
      Modify the bitcast to support native long
      Add the logic for UPSAMPLE_LONG
      Add test case for long bitcast.
      Add the missing LOAD_INT64_IMM virtual function for GEN8.
      Fix bug for bitcast test case because of long type.
      Add the check for src and dst span different registers.
      Fix the long bitcast post schedule bug.
      Add the logic for pack/unpack long for scalar.
      Correct the wrong type annotation for long in disasm.
      Fix bug for scalar long conversion.
      Backend: fix one bug of long mad_sat.
      Re-format the asm print for long imm
      Fix a bug of 1d image array test case.
      Backend: Fix one bug of printf because of ir reorder.
      Fix the bug of ulong mad sat
      Correct the bit fields error for indirect address of Gen8
      Backend: Add the indirect fields and functions for gen register.
      Backend: Add functions to set a0 register.
      Backend: Correct indirect mode encoder setting for Gen7.
      Backend: Correct indirect mode encoder setting for Gen8.
      Backend: Handle the bswap using indirect mode access.
      Add a0 setting and bswap logic for GEN8
      Backend: Delete bswap logic in the llvm_to_gen stage.
      Modify the utest case for bswap.
      Backend: Add the logic to handle uniform src for BSwap Gen8.
      Backend: Fix errors in disasm for indirect instruction Gen8.
      Add LLVM_INCLUDE_DIR to CMakeList of src.
      Generate NAN for UNDEF value in printf parser.
      Kill the A0 cache in GenContext.
     Backend: Add half float as a new type.
      Backend: Add half float support for immediate.
      backend: Add half float type into the instruction.
      Backend: Add half float support in gen backend.
      Backend: Add half to insn selection.
      Backend: Add half float ASM output support.
      libocl: Enable fp16 extension in the header.
      libocl: Add half builtin functions for relational module.
      libocl: Add half builtin functions to math module.
      libocl: Add half builtin functions for common module.
      libocl: Add the builtin xxx_as functions for half.
      libocl: Add builtin convert functions for half.
      libocl: Add vector generation for half type.
      backend: Add convert_sat functions for half in the gen_backend.
      Backend: Add support for half's div and rem.
      backend: Add conversion support between half and other type.
      libocl: Add all the half defines for script generation.
      utest: Add test cases for half.
      runtime: Add fp16 extension to BDW later platform.
      runtime: Use cl_get_platform_default to replace global value.
      runtime: Add cl device's standalone extension.
      Backend: Add half support for CHV and SKL.
      Runtime: Add default extension for platforms before BDW.
      libocl: Add macro define for fp16.

Koop Mast (2):
      Fix comparison in if() check.
      Reorder GBE_BIN_GENERATER arguments.

Luo (3):
      add llvm intrinsic call translate.
      fix llvm.trunc.float instruction bug.
      check the predication in case of endless loop.

Luo Xionghu (47):
      reuse the loop info from llvm.
      add the reduced self loop node detection.
      fix dnetc overflow issue.
      fix bswap implementation issue.
      refine bswap utest to cover nsetc fail cases.
      refine overflow utest to cover nsetc fail cases.
      disable overflow utest test before llvm-3.5
      add half math function support.
      fix max_parameter_size not correct on x86 platforms.
      fix min_max_read_image_args and min_max_parameter_size issue.
      add collectImageArgs to handle image count limitations.
      add LZD IR instruction.
      add clz(count leading zero) utest.
      fix the wrong implementation of popcount.
      reimplement the LZD instruction in backend.
      libocl: reimplement clz with lzd instruction instead of fbh.
      fix clz utest issue.
      remove the libFunc interface.
      replace sin/cos with llvm intrinsic.
      replace sqrt with llvm intrinsic.
      replace log with llvm intrinsic.
      replace exp with llvm intrinsic.
      enable cl_khr_spir extension to build and run from SPIR binary.
      change the workitem related api to OVERLOABABLE.
      SPIR binary support for printf function.
      add utest for load spir binary.
      replace fabs with llvm intrinsic.
      replace rndz with llvm intrinsic.
      replace rnde with llvm intrinsic.
      replace rndu with llvm intrinsic.
      replace rndd with llvm intrinsic.
      replace mad with llvm intrinsic.
      replace pow with llvm intrinsic.
      reset the SPIR target datalayout.
      only support spir extension for beignet build with llvm 3.5 or later.
      simple return if spir extension not supported.
      strip PointerCast for call instructions before use.
      Optimization of clEnqueueCopyBufferToImage for 16 aligned case.
      add benckmark for copy data from buffer to image.
      add environment variable OCL_OUTPUT_KERNEL_SOURCE.
      reimplement structurize algorithm.
     fix global variable out of boundary writing in libocl.
      use self test to determine enable/or disable atomics in L3 for HSW.
      don't merge serial blocks with barrier.
      structuralize bug fix: check the loop successor before merge.
      fix utest bug.
      runtime bug: brw GT3 devices reported to GT2.

Lv Meng (1):
      Fix a makefile bug for gcc is not the default compiler

Meng Mengmeng (4):
      utests: make utests maths ULP values consistent with specification
      add edge case detection for powr in utests
      CHV: Add cherryview support in the runtime.
      correct ULP value in utests

Rebecca N. Palmer (11):
      Return error, don't crash, on allocation failure
      Crash when hardware inaccessible
      Enable multiarch (32/64-bit co-installation)
      FindLLVM: allow LLVM/Clang 3.6
      Don't crash if device inaccessible
      utest_pow: don't fail on declared lack of denormals.
      Make tgamma meet the accuracy standard.
      Allow building with Python 3
      utests: fix test case builtin_tgamma.
      Add a sanity test in clGetDeviceIDs
      Docs: update/clarify Haswell issues

Rebecca Palmer (1):
      Use matching versions of clang/llvm and libclang/libllvm

Ruiling Song (50):
      GBE: Place loop exits after loop blocks when sorting basic blocks.
      GBE: Output CFG of Gen IR to dot file.
      GBE: Re-implement BTI logic in backend
      GBE: support const private array initialization.
      utests: Add const private array initialization test.
      GBE: Fix a disassembly bug.
      GBE: Fix the printf issue caused by new bti implementation
      libocl: Fix precision of builtin tanpi.
      libocl: Move spec required macro to header file.
      libocl: Improve precision of pow/powr.
      libocl: Imporve precision of exp()
      libocl: Flush denorm input into zero in rootn()
      libocl: flush denorm into zero in ldexp()
      libocl: Correctly handle -inf in exp10.
      libocl: flush denorm to zero in remquo()
      libocl: implement high precision pown()
      libocl: remove useless code.
      libocl: Reimplement trigonometric functions.
      libocl: Refine char/short abs() implementation.
      libocl: refine implementation of abs_diff()
      libocl: using mad() to implement dot()
      GBE: Only add non-zero offset in gep lowering pass.
      libocl: refine length() implementation.
      libocl: refine implementation of normalize().
      libocl: refine implementation of logb().
      GBE: Import constantexpr lower pass from pNaCl
      GBE: expand large integer instructions
      GBE: Fix a build error against llvm release version
      GBE: Fix a bug in legalize pass.
      GBE: Load/store should use same address space as before.
      GBE: Import PromoteIntegers pass from pNaCl
      libocl: refine implementation of sign().
      GBE: We need use exiting block here.
      libocl: define NULL to zero
      libocl: Directly scalarize built-in with vector input.
      GBE: unify element type before insertelement in legalize pass.
      GBE: Support unaligned load/store of dword/qword in GenIR.
      GBE: Fix fast-math issue under llvm 3.6.
      GBE: Only emit multiply when immediate is not one.
      GBE: make all memory operation share same bti dependency.
      GBE: Fix the immediate data type
      GBE: remove unnecessary assert
      gbe: Implement a new BTI solution to support dynamic bti
      GBE: Fix a typo that cause gpu hang.
      GBE: add offsetID in SelectionVector.
      GBE: Support storing/loading pointers to/from private array
      GBE: optimize phi elimination.
      GBE/IMM: Temporarily allow integer type in getFloatValue()
      GBE: Fix a bug in phicopy coaleasing.
      GBE: Fix a bug in assigning image bti.

Yan Wang (4):
      Fix based on piglit OpenCL falied case (cl-api-compile-program).
      Fix delete operator using.
      Fix PrintfState copying.
      Fix loop condition of PrintfSet constructor.

Yang Rong (38):
      Change the IVB/HSW L3 SQC credit setting.
      Fix NO_TILING alignment bug.
      BDW: Change the default tiling mode to TILING_Y on BDW.
      Fix the opencv_test_core/OCL_Arithm random segment fault.
      Change the IVB/HSW's max_work_group_size to 512, and BYT to 256.
      Add read buffer/image benchmark.
      SKL: Add skl pci ids and device.
      SKL: enable skl device.
      SKL: Use TILE_Y as default TILING mode in skl.
      SKL: correct the pipe control struct.
      SKL: Add the function gen9' intel_build_idrt.
      SKL: add skl select_pipeline and cache_control functions.
      SKL: Add function intel_gpgpu_bind_image_gen9.
      SKL: fix skl LD fail.
      SKL: fix some 3D and 2D array image fail.
      SKL: fix some failed piglit tests, caused by read constant error.
      SKL: Fix opencv perf hang.
      Change the KB and MB define to enum.
      Use llvm-c's LLVMLinkModules instead of llvm::Linker::LinkModules.
      Add llvm3.6 build support.
      Remove useless llvm head file FindUsedTypes.h.
      Correct the error llvm link msg copy in function genProgramLinkFromLLVM.
      Fix llvm3.6 build error.
      BDW: Refine I64HADD and I64RHADD.
      BDW: Refine unpacked_ud in the gen8_context.cpp.
      Chv: Add chv backend support.
      Fix two argument lowering bug.
      Add Indirect struct argument read support.
      Add stuct argument indirect load test.
      CHV: Fix a chv long convert bug.
      Fix a indirect register bug.
      Fix a indirect argument load bug.
      Turn on OCL_STRICT_CONFORMANCE as default.
      Use NP2 stack size to avoid cache line conflict.
      Use the Byte Gather after HSW when read byte/shor.
      Set the loop unroll's threshold to 1024.
      BDW: override GenContext::patchBranches.
      Runtime: correct event and the wait events compare when check event.

Yang, Rong (1):
      Separate flush and invalidate in function intel_gpgpu_pipe_control.

Zhenyu Wang (4):
      Remove deprecated fulsim code
      Add aub dump support
      Use libdrm interface to get device id
      Remove obsolete MI_FLUSH

Zhigang Gong (94):
      Remove patch version on master branch.
      GBE: disable spill register under simd16 mode.
      utests: fix work group size issue in compiler_fill_image_2d_array.
      utests: fix a typo in test cases.
      GBE: don't split instruction for some special case.
      utests: reduce work group size to 256 to satisfy BYT platform.
      utests: fix indent in CMakeLists.txt
      GBE: Fix bug with negative constant GEP index.
      utests: Add one case to test negative index array access.
      GBE: fix a regression caused by the negative index handling patch.
      GBE: optimize GEP constant offset calculation.
      GBE: remove useless code.
      GBE: eliminate duplicate GEP handling logic.
      GBE: Add constant pointer in the memcpy intrinsic.
      CL: Don't find mesa source code.
      GBE: Add some missing constant expression cases.
      Update optimization tips.
      GBE: don't always treat a multiple destination instruction as root.
      Refactor all image builtin functions.
      GBE: switch to use CLANG native image types.
      GBE: switch to CLANG native sampler_t.
      GBE: remove some image1d_buffer related builtin functions.
      GBE/CL: use 2D image to implement large image1D_buffer.
      GBE: code cleanup.
      GBE: fix an image regression.
      GBE: use sr0.1's SLM Offset to eliminate the software SLM offset for HSW.
      GBE: remove software maintained SLM offset related code.
      utests: reduce test count.
      runtime: tweak max memory allocation size.
      runtime: fix max work group size for IVBGT1.
      Don't check some edge condtion in non-strict mode.
      CL/Driver: enable atomics in L3 for HSW.
      CL/Driver: quick fix regression caused by remove MI_FLUSH.
      utests: skip one test when it fail to open XDisplay.
      CL/Driver/HSW: Convert L3 cycle for texture to uncachable.
      update document.
      GBE: fix an ACC register related instruction scheduling bug
      GBE: fix popcount bugs.
      GBE: add GEN_TYPE_HF to getTypeSize.
      Add submodule libva for examples.
      update document.
      Remove useless old legalize related files.
      GBE: Need to check invalid register to determine the actual src num.
      GBE: fix the hacky usage of invalid register.
      runtime: fix a potential null pointer dereference.
      runtime: don't free the host_ptr for a subbuffer.
      GBE: fix build error for llvm 3.6.
      GBE: fix build error for LLVM 3.4/3.3.
      build: use @BEIGNET_INSTALL_DIR@ for the icd file.
      GBE: expand constant expressions in constant vector
      GBE: remove constant expression handling code in gen writer pass.
      GBE: remove the unecessary type check for SEL instructio.
      GBE: support compare two bool variables.
      GBE: add fastcall support.
      Build: use -Bsymbolic to fix conflicts with other LLVM users.
      GBE: add a new incompatible compile option -cl-finite-math-only.
      Revert "libocl: using mad() to implement dot()"
      GBE: fix an image related bugs.
      Build: set 3.5 as the stable LLVM version for beignet.
      Docs: update team information.
      Bump to 1.1
      runtime: fix a conformance bug in cl_get_kernel_arg_info.
      Build: fix the beignet icd name when CMAKE_INSTALL_FULL_LIBDIR is undefined.
      update document.
      GBE: correct some temporary virtual register's simdWidth.
      GBE: avoid to use the GenRegister::xxxgrf(simdWidth,xxx).
      runtime: Enhance the error handling when flush gpgpu command queue.
      strip unsupported attributes and calling conventions.
      GBE: fix safe type definition.
      GBE: extend registers/tuples/immediates to 32bit wide.
      GBE: extend backend label to 32 bit.
      GBE: don't type cast register/labelindex to integer.
      GBE: Extend front label ip to 32 bit on demand.
      GBE: Use actual bti information to determine a pointer's addressspace.
      GBE: refine error handling for private libva buffer sharing extension.
      GBE: correct the instruction replacement logic in scalarize pass.
      GBE: fix an potential assertion in constant expanding pass.
      GBE: fix a bug in byte scatter write.
      runtime: don't try to open nonexistent render nodes or device files.
      utests: don't continue to run any case when fail to initialize device.
      Doc: add a command to install dependencies.
      GBE: should initialize useDWLabel to false by default.
      Doc: update cmd parser issue for HSW platforms.
      Doc: update known issue for the store/load pointer issue.
      Revert "CL/Driver: enable atomics in L3 for HSW."
      GBE: fix LOD initialization for typed write instruction.
      Remove some LGPL incompatible code.
      build: use EXECUTE_PROCESS to replace the deprecated EXEC_PROGRAM.
      GBE: fix one potential register spilling bug.
      runtime: Need to separate atomic in L3 test and SLM test in self_test().
      Fixed a thread safe bug.
      Need to check eventWaitList in clEnqueueNDRangeKernel.
      runtime: fix a builtin-kernel related thread safe bug.
      Remove deprecated function cl_context_get_static_kernel().

Zhu Bingbing (2):
      change the utest summary code
      add builtin function atan2pi and ldexp

--
Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/beignet/attachments/20150731/5f7b3fe6/attachment-0001.html>