[Beignet] [ANNOUNCE] Beignet 1.1.0 (2015-07-31)
Yang, Rong R
rong.r.yang at intel.com
Thu Jul 30 23:58:03 PDT 2015
Beignet 1.1.0 (2015-07-31)
========================
Beignet development team is pleased to announce that Beignet version 1.1.0 has been released. The is another milestone after version 1.0.0. In this release, Beignet got Khronos OpenCL 1.2 conformance certification on BDW. Meanwhile, Beignet supports more and more platforms, include 3rd to 6th Generation Intel Core Processors and some Intel Atom Processors. This release also add some new features, such as cl_intel_subgroups and cl_khr_spir extensions and half data type support. This release continue to improve the quality and performance, some OpenCV 3.0 benchmarks has 3x ~ 4x performance gain.
The highlighted improvements are as below:
1. Added 6th generation Intel Core Processors (SKL) support.
2. Added 5th generation Intel Atom Processors (BSW) support.
3. Re-implement structurized control flow algorithm.
4. Added half data type support after BDW.
5. Implement a new BTI solution to support dynamic bti.
6. Added extension cl_intel_subgroups support.
7. Replace some built-ins with llvm intrinsic.
8. Add Indirect structure argument read support.
9. Enable cl_khr_spir extension to build and run from SPIR binary.
10. Improvement some opencv test suite performance on all platforms.
11. Several other bug fixes since last release.
Git tag: Release_v1.1.0
Gitweb URL: http://cgit.freedesktop.org/beignet
https://01.org/sites/default/files/beignet-1.1.0-source.tar.gz
md5sum: a7a9276a3e635266274240e8fc8e4490 beignet-1.1.0-source.tar.gz
sha1sum: 67f58f3f1df9cd970998342b0df619e41c80b20b beignet-1.1.0-source.tar.gz
sha256sum: 4afe09ea13cd7f8475b9f6534e97cb4dcd307c602095c6968b0ed22290753386 beignet-1.1.0-source.tar.gz
-----------------------------------------------------------------
Changes since 1.0.0:
Andreas Beckmann (2):
prefer newer llvm versions over 3.3
remove unsafe define -D__$(USER)__
Brian Kloppenborg (1):
BUGFIX: Prohibit 'make package' from doing system install of ICD vendor file
Chuanbo Weng (11):
Change CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR from 8 to 16.
Refine benchmark output.
Refine copy_buf benchmark and rename the file.
Implement 1D/2D image array related cl_mem_kernel_copy_image in cl way instead of cpu way.
Add example to show libva buffer sharing with extension clCreateImageFromLibvaIntel.
Add document to describe the detials of libva buffer sharing.
Add benchmark of clEnqueueCopyImageToBuffer(copy 2d image to buffer).
Optimization of clEnqueueCopyImageToBuffer for 16 aligned case.
Fix error in CMakeLists.txt of examples.
Add example to show v4l2 buffer sharing with extension clGetMemObjectFdIntel.
Add document to describe the detials of v4l2 buffer sharing.
David Couturier (1):
Fix: (v3) Event callback that were not executed when command was already CL_COMPLETE + thread safety for callbacks
Guo Yejun (37):
re-enable userptr with fix: CPU access after GPU finishes the rendering
fix issue to create cl image from libva with non-zero offset
add test for clCreateImageFromLibvaIntel
fix issue to pass utest of runtime_climage_from_boname for BDW
clean code, the logic is already at the beginning of function
add test of cl_mem_use_host_ptr into benchmark
refine utest of cl_mem_use_host_ptr
enable CL_MEM_ALLOC_HOST_PTR with user_ptr to avoid copy between GPU/CPU
add utest of CL_MEM_ALLOC_HOST_PTR
replace hash_map with map
do not include llvm/clang headers for libgbeinterp
change Immediate::operator= from private to public
do not use C++11 features inside libgbeinterp
fix utest build for some old gcc version
refine gbe_bin_generater usage to add -t option
remove useless dependency libocl
add option BUILD_STANDALONE_GBE_COMPILER to build static compiler
add CMake option USE_STANDALONE_GBE_COMPILER and STANDALONE_GBE_COMPILER_DIR
only build tests that do not need compiler when standalone compiler is provided
add howto for old gcc version
correct the cache line size to be 64
loose the alignment limitation for host_ptr of CL_MEM_USE_HOST_PTR
update utest to loose userptr limitation
correct env var to output llvm IR
add simd level function __gen_ocl_get_simd_size
add utest for __gen_ocl_get_simd_size
add simd level function __gen_ocl_get_simd_id
add utest for __gen_ocl_get_simd_id
add introduction to build Beignet with yocto
rename __gen_ocl_get_simd_id/size to get_sub_group_id/size
add sub group functions intel_sub_group_shuffle
rename __gen_ocl_simd_any/all to sub_group_any/all
add utest for intel_sub_group_shuffle
correct the src output of alu3 when OCL_OUTPUT_ASM=1
enable CL_RG + CL_UNORM_INT8 for image
enable CL_UNSIGNED_INT8 for CL_RG to fix regression
Use a separate pattern for simd shuffle instead of binary pattern
Jeff McGee (2):
Add driver callback for updating device info
Query the driver directly for compute units and subslice
Junyan He (82):
Fix the printf buffer size bug.
Import the native long type of ul1 ul8 and ul16
Add long type support for disasm.
Add long imm value in gen8 instruction.
Add unpacked ud and unpacked uw for long type.
Add the long unpacked ud uw into the instruction schedule consideration
Modify the split logic in encoder
Add the u64 imm type in register
Add long support flag into gen selection
Modify the load IMM 64 function.
Disasm supports to print long imm value in instruction.
Add functions for conversion between native and fake long.
Overload the READ64 and WRITE64 function for Gen8
Add the canHandleLong virtual function into gen encoder
Overload all the alu1 and alu1withtemp functions.
Overload all the simple binary functions.
Modify the convert logic in gen selection.
Modify the fake dst register for CMP in register allocation.
Overload I64MUL function.
Overload the i64MULHI function.
Overload the I64HADD function.
Overload I64RHADD function.
Overload the I64MADSAT function.
Overlaod I64 Div and Rem function.
Modify the cmp and sel for I64 and cleanup all virtual functions.
Add test case for long mul_sat and mul_hi
Add test case for i64 div and rem.
Add long NOT test case.
Add PackLong and UnpackLong functions
Modify the bitcast to support native long
Add the logic for UPSAMPLE_LONG
Add test case for long bitcast.
Add the missing LOAD_INT64_IMM virtual function for GEN8.
Fix bug for bitcast test case because of long type.
Add the check for src and dst span different registers.
Fix the long bitcast post schedule bug.
Add the logic for pack/unpack long for scalar.
Correct the wrong type annotation for long in disasm.
Fix bug for scalar long conversion.
Backend: fix one bug of long mad_sat.
Re-format the asm print for long imm
Fix a bug of 1d image array test case.
Backend: Fix one bug of printf because of ir reorder.
Fix the bug of ulong mad sat
Correct the bit fields error for indirect address of Gen8
Backend: Add the indirect fields and functions for gen register.
Backend: Add functions to set a0 register.
Backend: Correct indirect mode encoder setting for Gen7.
Backend: Correct indirect mode encoder setting for Gen8.
Backend: Handle the bswap using indirect mode access.
Add a0 setting and bswap logic for GEN8
Backend: Delete bswap logic in the llvm_to_gen stage.
Modify the utest case for bswap.
Backend: Add the logic to handle uniform src for BSwap Gen8.
Backend: Fix errors in disasm for indirect instruction Gen8.
Add LLVM_INCLUDE_DIR to CMakeList of src.
Generate NAN for UNDEF value in printf parser.
Kill the A0 cache in GenContext.
Backend: Add half float as a new type.
Backend: Add half float support for immediate.
backend: Add half float type into the instruction.
Backend: Add half float support in gen backend.
Backend: Add half to insn selection.
Backend: Add half float ASM output support.
libocl: Enable fp16 extension in the header.
libocl: Add half builtin functions for relational module.
libocl: Add half builtin functions to math module.
libocl: Add half builtin functions for common module.
libocl: Add the builtin xxx_as functions for half.
libocl: Add builtin convert functions for half.
libocl: Add vector generation for half type.
backend: Add convert_sat functions for half in the gen_backend.
Backend: Add support for half's div and rem.
backend: Add conversion support between half and other type.
libocl: Add all the half defines for script generation.
utest: Add test cases for half.
runtime: Add fp16 extension to BDW later platform.
runtime: Use cl_get_platform_default to replace global value.
runtime: Add cl device's standalone extension.
Backend: Add half support for CHV and SKL.
Runtime: Add default extension for platforms before BDW.
libocl: Add macro define for fp16.
Koop Mast (2):
Fix comparison in if() check.
Reorder GBE_BIN_GENERATER arguments.
Luo (3):
add llvm intrinsic call translate.
fix llvm.trunc.float instruction bug.
check the predication in case of endless loop.
Luo Xionghu (47):
reuse the loop info from llvm.
add the reduced self loop node detection.
fix dnetc overflow issue.
fix bswap implementation issue.
refine bswap utest to cover nsetc fail cases.
refine overflow utest to cover nsetc fail cases.
disable overflow utest test before llvm-3.5
add half math function support.
fix max_parameter_size not correct on x86 platforms.
fix min_max_read_image_args and min_max_parameter_size issue.
add collectImageArgs to handle image count limitations.
add LZD IR instruction.
add clz(count leading zero) utest.
fix the wrong implementation of popcount.
reimplement the LZD instruction in backend.
libocl: reimplement clz with lzd instruction instead of fbh.
fix clz utest issue.
remove the libFunc interface.
replace sin/cos with llvm intrinsic.
replace sqrt with llvm intrinsic.
replace log with llvm intrinsic.
replace exp with llvm intrinsic.
enable cl_khr_spir extension to build and run from SPIR binary.
change the workitem related api to OVERLOABABLE.
SPIR binary support for printf function.
add utest for load spir binary.
replace fabs with llvm intrinsic.
replace rndz with llvm intrinsic.
replace rnde with llvm intrinsic.
replace rndu with llvm intrinsic.
replace rndd with llvm intrinsic.
replace mad with llvm intrinsic.
replace pow with llvm intrinsic.
reset the SPIR target datalayout.
only support spir extension for beignet build with llvm 3.5 or later.
simple return if spir extension not supported.
strip PointerCast for call instructions before use.
Optimization of clEnqueueCopyBufferToImage for 16 aligned case.
add benckmark for copy data from buffer to image.
add environment variable OCL_OUTPUT_KERNEL_SOURCE.
reimplement structurize algorithm.
fix global variable out of boundary writing in libocl.
use self test to determine enable/or disable atomics in L3 for HSW.
don't merge serial blocks with barrier.
structuralize bug fix: check the loop successor before merge.
fix utest bug.
runtime bug: brw GT3 devices reported to GT2.
Lv Meng (1):
Fix a makefile bug for gcc is not the default compiler
Meng Mengmeng (4):
utests: make utests maths ULP values consistent with specification
add edge case detection for powr in utests
CHV: Add cherryview support in the runtime.
correct ULP value in utests
Rebecca N. Palmer (11):
Return error, don't crash, on allocation failure
Crash when hardware inaccessible
Enable multiarch (32/64-bit co-installation)
FindLLVM: allow LLVM/Clang 3.6
Don't crash if device inaccessible
utest_pow: don't fail on declared lack of denormals.
Make tgamma meet the accuracy standard.
Allow building with Python 3
utests: fix test case builtin_tgamma.
Add a sanity test in clGetDeviceIDs
Docs: update/clarify Haswell issues
Rebecca Palmer (1):
Use matching versions of clang/llvm and libclang/libllvm
Ruiling Song (50):
GBE: Place loop exits after loop blocks when sorting basic blocks.
GBE: Output CFG of Gen IR to dot file.
GBE: Re-implement BTI logic in backend
GBE: support const private array initialization.
utests: Add const private array initialization test.
GBE: Fix a disassembly bug.
GBE: Fix the printf issue caused by new bti implementation
libocl: Fix precision of builtin tanpi.
libocl: Move spec required macro to header file.
libocl: Improve precision of pow/powr.
libocl: Imporve precision of exp()
libocl: Flush denorm input into zero in rootn()
libocl: flush denorm into zero in ldexp()
libocl: Correctly handle -inf in exp10.
libocl: flush denorm to zero in remquo()
libocl: implement high precision pown()
libocl: remove useless code.
libocl: Reimplement trigonometric functions.
libocl: Refine char/short abs() implementation.
libocl: refine implementation of abs_diff()
libocl: using mad() to implement dot()
GBE: Only add non-zero offset in gep lowering pass.
libocl: refine length() implementation.
libocl: refine implementation of normalize().
libocl: refine implementation of logb().
GBE: Import constantexpr lower pass from pNaCl
GBE: expand large integer instructions
GBE: Fix a build error against llvm release version
GBE: Fix a bug in legalize pass.
GBE: Load/store should use same address space as before.
GBE: Import PromoteIntegers pass from pNaCl
libocl: refine implementation of sign().
GBE: We need use exiting block here.
libocl: define NULL to zero
libocl: Directly scalarize built-in with vector input.
GBE: unify element type before insertelement in legalize pass.
GBE: Support unaligned load/store of dword/qword in GenIR.
GBE: Fix fast-math issue under llvm 3.6.
GBE: Only emit multiply when immediate is not one.
GBE: make all memory operation share same bti dependency.
GBE: Fix the immediate data type
GBE: remove unnecessary assert
gbe: Implement a new BTI solution to support dynamic bti
GBE: Fix a typo that cause gpu hang.
GBE: add offsetID in SelectionVector.
GBE: Support storing/loading pointers to/from private array
GBE: optimize phi elimination.
GBE/IMM: Temporarily allow integer type in getFloatValue()
GBE: Fix a bug in phicopy coaleasing.
GBE: Fix a bug in assigning image bti.
Yan Wang (4):
Fix based on piglit OpenCL falied case (cl-api-compile-program).
Fix delete operator using.
Fix PrintfState copying.
Fix loop condition of PrintfSet constructor.
Yang Rong (38):
Change the IVB/HSW L3 SQC credit setting.
Fix NO_TILING alignment bug.
BDW: Change the default tiling mode to TILING_Y on BDW.
Fix the opencv_test_core/OCL_Arithm random segment fault.
Change the IVB/HSW's max_work_group_size to 512, and BYT to 256.
Add read buffer/image benchmark.
SKL: Add skl pci ids and device.
SKL: enable skl device.
SKL: Use TILE_Y as default TILING mode in skl.
SKL: correct the pipe control struct.
SKL: Add the function gen9' intel_build_idrt.
SKL: add skl select_pipeline and cache_control functions.
SKL: Add function intel_gpgpu_bind_image_gen9.
SKL: fix skl LD fail.
SKL: fix some 3D and 2D array image fail.
SKL: fix some failed piglit tests, caused by read constant error.
SKL: Fix opencv perf hang.
Change the KB and MB define to enum.
Use llvm-c's LLVMLinkModules instead of llvm::Linker::LinkModules.
Add llvm3.6 build support.
Remove useless llvm head file FindUsedTypes.h.
Correct the error llvm link msg copy in function genProgramLinkFromLLVM.
Fix llvm3.6 build error.
BDW: Refine I64HADD and I64RHADD.
BDW: Refine unpacked_ud in the gen8_context.cpp.
Chv: Add chv backend support.
Fix two argument lowering bug.
Add Indirect struct argument read support.
Add stuct argument indirect load test.
CHV: Fix a chv long convert bug.
Fix a indirect register bug.
Fix a indirect argument load bug.
Turn on OCL_STRICT_CONFORMANCE as default.
Use NP2 stack size to avoid cache line conflict.
Use the Byte Gather after HSW when read byte/shor.
Set the loop unroll's threshold to 1024.
BDW: override GenContext::patchBranches.
Runtime: correct event and the wait events compare when check event.
Yang, Rong (1):
Separate flush and invalidate in function intel_gpgpu_pipe_control.
Zhenyu Wang (4):
Remove deprecated fulsim code
Add aub dump support
Use libdrm interface to get device id
Remove obsolete MI_FLUSH
Zhigang Gong (94):
Remove patch version on master branch.
GBE: disable spill register under simd16 mode.
utests: fix work group size issue in compiler_fill_image_2d_array.
utests: fix a typo in test cases.
GBE: don't split instruction for some special case.
utests: reduce work group size to 256 to satisfy BYT platform.
utests: fix indent in CMakeLists.txt
GBE: Fix bug with negative constant GEP index.
utests: Add one case to test negative index array access.
GBE: fix a regression caused by the negative index handling patch.
GBE: optimize GEP constant offset calculation.
GBE: remove useless code.
GBE: eliminate duplicate GEP handling logic.
GBE: Add constant pointer in the memcpy intrinsic.
CL: Don't find mesa source code.
GBE: Add some missing constant expression cases.
Update optimization tips.
GBE: don't always treat a multiple destination instruction as root.
Refactor all image builtin functions.
GBE: switch to use CLANG native image types.
GBE: switch to CLANG native sampler_t.
GBE: remove some image1d_buffer related builtin functions.
GBE/CL: use 2D image to implement large image1D_buffer.
GBE: code cleanup.
GBE: fix an image regression.
GBE: use sr0.1's SLM Offset to eliminate the software SLM offset for HSW.
GBE: remove software maintained SLM offset related code.
utests: reduce test count.
runtime: tweak max memory allocation size.
runtime: fix max work group size for IVBGT1.
Don't check some edge condtion in non-strict mode.
CL/Driver: enable atomics in L3 for HSW.
CL/Driver: quick fix regression caused by remove MI_FLUSH.
utests: skip one test when it fail to open XDisplay.
CL/Driver/HSW: Convert L3 cycle for texture to uncachable.
update document.
GBE: fix an ACC register related instruction scheduling bug
GBE: fix popcount bugs.
GBE: add GEN_TYPE_HF to getTypeSize.
Add submodule libva for examples.
update document.
Remove useless old legalize related files.
GBE: Need to check invalid register to determine the actual src num.
GBE: fix the hacky usage of invalid register.
runtime: fix a potential null pointer dereference.
runtime: don't free the host_ptr for a subbuffer.
GBE: fix build error for llvm 3.6.
GBE: fix build error for LLVM 3.4/3.3.
build: use @BEIGNET_INSTALL_DIR@ for the icd file.
GBE: expand constant expressions in constant vector
GBE: remove constant expression handling code in gen writer pass.
GBE: remove the unecessary type check for SEL instructio.
GBE: support compare two bool variables.
GBE: add fastcall support.
Build: use -Bsymbolic to fix conflicts with other LLVM users.
GBE: add a new incompatible compile option -cl-finite-math-only.
Revert "libocl: using mad() to implement dot()"
GBE: fix an image related bugs.
Build: set 3.5 as the stable LLVM version for beignet.
Docs: update team information.
Bump to 1.1
runtime: fix a conformance bug in cl_get_kernel_arg_info.
Build: fix the beignet icd name when CMAKE_INSTALL_FULL_LIBDIR is undefined.
update document.
GBE: correct some temporary virtual register's simdWidth.
GBE: avoid to use the GenRegister::xxxgrf(simdWidth,xxx).
runtime: Enhance the error handling when flush gpgpu command queue.
strip unsupported attributes and calling conventions.
GBE: fix safe type definition.
GBE: extend registers/tuples/immediates to 32bit wide.
GBE: extend backend label to 32 bit.
GBE: don't type cast register/labelindex to integer.
GBE: Extend front label ip to 32 bit on demand.
GBE: Use actual bti information to determine a pointer's addressspace.
GBE: refine error handling for private libva buffer sharing extension.
GBE: correct the instruction replacement logic in scalarize pass.
GBE: fix an potential assertion in constant expanding pass.
GBE: fix a bug in byte scatter write.
runtime: don't try to open nonexistent render nodes or device files.
utests: don't continue to run any case when fail to initialize device.
Doc: add a command to install dependencies.
GBE: should initialize useDWLabel to false by default.
Doc: update cmd parser issue for HSW platforms.
Doc: update known issue for the store/load pointer issue.
Revert "CL/Driver: enable atomics in L3 for HSW."
GBE: fix LOD initialization for typed write instruction.
Remove some LGPL incompatible code.
build: use EXECUTE_PROCESS to replace the deprecated EXEC_PROGRAM.
GBE: fix one potential register spilling bug.
runtime: Need to separate atomic in L3 test and SLM test in self_test().
Fixed a thread safe bug.
Need to check eventWaitList in clEnqueueNDRangeKernel.
runtime: fix a builtin-kernel related thread safe bug.
Remove deprecated function cl_context_get_static_kernel().
Zhu Bingbing (2):
change the utest summary code
add builtin function atan2pi and ldexp
--
Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/beignet/attachments/20150731/5f7b3fe6/attachment-0001.html>
More information about the Beignet
mailing list