[Beignet] [ANNOUNCE] Beignet 1.3.0
Yang, Rong R
rong.r.yang at intel.com
Fri Jan 20 09:35:52 UTC 2017
Beignet 1.3.0
========================
Beignet version 1.3 has been released. This is a major release of Beignet. This release include lots of improvements. The most important one is complete OpenCL 2.0 support. From 6th generation Intel Core Processors, include Skylake, Kabylake and Apollolake, OpenCL 2.0 support can be turned on or off with build. When OpenCL 2.0 support is turned on with build, Beignet complies with OpenCL 2.0 spec. For more OpenCL 2.0 information, please refer to the README. Another improvement is runtime driver's refinement. Beignet re-implement the event module and enqueue module, make them more modular and structured. Beignet supported more extensions, speeded up kernel compiling time and improved the performance in this release.
The highlighted improvements are as below:
1. OpenCL 2.0 support.
2. OpenCL event and enqueue module re-implement.
3. Other OpenCL runtime driver refine.
4. LLVM 3.9 support.
5. Extension cl_khr_gl_sharing support.
6. Extension intel_subgroups_short support.
7. Large kernel compiling speed up.
8. Register allocation improvement.
9. Bugs fix.
Git tag: Release_v1.3.0
Gitweb URL: http://cgit.freedesktop.org/beignet
https://01.org/sites/default/files/beignet-1.3.0-source.tar.gz
md5sum: ff4b5f66fc66649aef883e5602d0a3b1 beignet-1.3.0-source.tar.gz
sha1sum: e77f7bcca16e3f19066a7335876b7ba3ffc3ee39 beignet-1.3.0-source.tar.gz
sha256sum: 63d98b4fe8fba3dbc0299d29fef84560625e5ac51b16b8fed453021d4afb5cd5 beignet-1.3.0-source.tar.gz
-----------------------------------------------------------------
Changes since 1.2.0:
Armin K (1):
buildsys: Use CMRT_LIBDIR instead of CMRT_LIBRARY_DIRS
Chuanbo Weng (3):
Runtime: re-enable cl_khr_gl_sharing with existing egl extension.
rumtime: check all the extension id, not only BASE and OPT1.
runtime: set cl_intel_motion_estimation as IVB specifc device extension.
Giuseppe Bilotta (2):
Fix shift-overflow warning
toMB: use standard constant
Guo Yejun (12):
fix the condition to check if there are built-in kernels
use OCL_MAP_BUFFER_GTT to map climage
avoid too many messages when the driver could not find good values for local_size
fix w of image when simulate image1dbuffer with image2d
add another broxton pciid 0x5A85
enlarge stack size for chv since its EU might be masked
enlarge scratch size for bxt 0x5a85
add bxt with pciid 0x1A84
correct the kernel name
add bxt with pciid 0x1A85
change PCI_CHIP_BROXTON_P to PCI_CHIP_BROXTON_0 to unify the naming
fix UNTYPED_WRITE function parameters for Gen75Encoder::UNTYPED_WRITE
Guo, Yejun (21):
fix build issue when HAS_BO_SET_SOFTPIN is false
remove some redundant code for printf
do not care dst for printf
do not touch src1 when setting instruction header
prepare gen9 sends binary format and enable the ASM dump for sends
support sends (split send) for untyped write
revert clCreateCommandQueue* from ocl2.0 back to 1.2 in utests
move function setDPByteScatterGather into class GenEncoder
add sends support for byte write
disable CMRT as default, since no real case reported
save host_ptr when create sub buffer from CL_MEM_ALLOC_HOST_PTR
enable sends for skl
refine code to change insn.extra.splitSend as encoder funtion parameter
support sends for long write
add sends for atomic operation, only for ocl 1.2
refine code starting from header in typedwrite
enable sends for typed write
output more detail of GEN IR for workgroup op
add sends support for oword/media block write
enable sends to write SLM for workgroup op
add sends support for printf
Igor Gnatenko (1):
Fix build with latest libdrm
Jan Vesely (3):
api: check kernel parameter before accessing it
tests: Use python2 explicitly
libocl: Provide specs required CL_VERSION macros
Junyan He (51):
Runtime: Add CL base object for all cl objects.
Runtime: Apply CL base object to program.
Runtime: Apply base object to cl_platform_id
Runtime: Apply base object to cl_device_id
Runtime: Apply base object ot cl_sampler.
Runtime: Apply base object to cl_mem.
Runtime: Apply base object to cl_event
Runtime: Apply base object to cl_context
Runtime: Apply base object to cl_command_queue.
Runtime: Apply base_object to cl_kernel
Runtime: Apply base object to cl_accelerator_intel
Add list operation to utils.
Add WAIT_ON_COND and WAIT_ON_COND to base object.
Delete all the verbose locks and use list to store CL objects.
Add command queue's enqueue thread.
Implement event related functions.
Modify all event related functions using new event handle.
Add ref check for CL object's validation.
Fix bugs in utest for event.
Add a multi-queue utest.
Delete useless cl_thread files.
Fix a bug for event error status.
Fix a bug for double free of enqueueNativeKernel.
Add error handle for command queue destroy.
Delete useless event list in command queue struct.
Add a helper function for all information get.
Modify clGetEventInfo using cl_get_info_helper.
Modify clGetPlatformInfo using cl_get_info_helper.
Modify clGetKernelInfo using cl_get_info_helper.
Modify clGetCommandQueueInfo using cl_get_info_helper.
Modify clGetContextInfo using cl_get_info_helper.
Modify clGetDeviceInfo using cl_get_info_helper.
Modify clGetSamplerInfo using cl_get_info_helper.
Modify program Info using cl_get_info_helper.
Modify clGetMemObjectInfo using cl_get_info_helper.
Modify clGetImageInfo using cl_get_info_helper.
Add helper functions for device list check.
Refine create context APIs.
Add multi devices support in context.
Refine clRetain/Release MemObject
Refine clCreateSampler API.
Refine retain/release sampler API
refine clCreateCommandQueue and clRetainCommandQueue.
Move Device related APIs to new file
Move clCreateCommandQueueWithProperties API to command_queue file.
Utest: Refine half and float convert functions.
Refine list related functions.
Add profiling feature based on new event implementation.
Improve event execute function.
Fix two bugs about event.
Fix a event notify bug.
Luo Xionghu (12):
add atomic operators output for GEN_IR and gen disa.
gbe: add AtomicA64 instructions with stateless access.
support generic atomic.
utest: add generic atomic test.
cl_mem_fence_flags definiton change from MACRO to enum
gbe: atomic_long type support.
address bits change to 64.
Runtime: Add API clCreateCommandQueueWithProperties
atomic_flag_test_and_set function fix.
gbe: use kernel_arg_base_type to recognize image arguments.
gbe: add vec_type_hint's type into functionAttributes.
atomic bug fix.
Mark Thompson (1):
Apply image offset to read/write/map operations
Meng Mengmeng (3):
Runtime: return CL_INVALID_EVENT_WAIT_LIST if not event in the wait list.
eliminate build warnings in i386 system.
Runtime: Use cl_ulong as CL_DEVICE_MAX_MEM_ALLOC_SIZE's return type.
Pan Xiuli (70):
Backend: Refine block_read buffer with unaligned OWord block read
Utest: Add test for half type subgroup functions
Backend: Fix printf bug for simd8
Runtime: Fix null device for clGetKernelWorkGroupInfo
Libocl: Add define for cl_intel_subgroups
Backend: Resize the selection instruction max dst num
Backend: Refine image block read with less vector and dst tmp
Backend: Fix simd id will broke in simd8 mode
Utest: Fix sub group broadcast for simd8
Backend: Fix simd shuffle base address
Utest: Fix sub group shuffle for simd8
Backend: Fix bug for sub/work group functions
Libocl: Fix get_sub_group_size bug
Backend: Refine gen ir ALU1 inst getType
Utest: Change the kernel index to fit case index
Runtime: Fix accesss quilifer for internal kernels
Libocl: Image should have access qualifier
Utest: read/write_only qualifier should only used with image.
Utest: Remove load spir test
Backend: Add support for LLVM 3.9 release
Backend: Refine GenRegiter::offset
Backend: Refine register offset for simd shuffle
Backend: Refine sub group broadcast code for spec
Libocl: Add sub group broadcast short builtin function
Utest: Add check subgroup short helper function
Utest: Add test case for sub group broadcast short
Backend: Change the sel ir optimization for unpack register
Backend: Add short sub group builtin functions
Utest: Add test case for sub group short builtin functions
Backend: Add sub groups short shuffle builtin functions
Utest: Add test case for short type sub group shuffle
Backend: Add subgroup short block read/write
Utest: Add subgroup block read/write ushort test case
Backend: Add A64 subgroup block read/write support
Libocl: Add intel_subgroups_short extension
Backend: Add built-in ctz function
Utest: add a test case for built-in ctz function
Runtime: Add clCreateSamplerWithProperties
Utest: Add sampler test
Runtime: Add support of OCL2.0 device queries
Runtime: Add extensions for OCL20
Runtime: Add pipe related APIs
Backend: Add Pipe Builtin support
Backend: Add pipe packet size check
Utest: Add pipe related test
Runtime: Add support for sRGB
Runtime: Refine clGetSupportedImageFormats to support CL_MEM_FLAGS
Runtime: Add suport for sRGB to clEnqueueCopyImage
Runtime: Add suport for sRGB to clEnqueueFillImage
Runtime: Add support for clGetMemObjectInfo
Backend: Refine get_enqueued_local_size and get_local_size
Runtime: Add support for non uniform group size
Backend: Clang now support static, fix now
libocl: Refine return type of workitem built-in functions
Backend: Chang scan limit for GVN pass
Runtime: Add support for queue size and fix error handling
Backend: Add RegisterFamily for ir
Backend: Initialize the extra value for selection instruction
Backend: Fix GenRegister::offset sub reg offset
Backend: Refine flag usage in instrction selection
Backend: Add kernel name for sel ir output
Backend: Refine instruction ID for sel ir
Backend: Refine selection IR output
Backend: Refine block read/write instruction selection
Backend: Fix some A64 block read/write bug
CMake: Add OCL20 env for utest
Backend: Fix sel ir subnr usage
Backend: Fix header address of oword block read/write
GBE: Fix memdep-block-scan-limit caused bug on LLVM3.8
GBE: Fix getTypesize bug with LLVM3.9
Rebecca N. Palmer (10):
Allow building tests with Python 3 (no string.atoi)
Utest: test pow, not powr, on negative x
Docs: Spelling and grammar fixes
Utests: use clGetExtensionFunctionAddressForPlatform
Utests: Don't end an all-tests run when one test fails
Utests: respect existing C/CXXFLAGS
Fix build failure with CMRT enabled
Utests: Allow testing cl_intel_accelerator via ICD
Add clGetKernelSubGroupInfoKHR to _cl_icd_dispatch table
Fail, don't assert, if unable to create context
Ruiling Song (25):
GBE: add untyped A64 stateless message
GBE: add byte scatter a64 message
GBE: Add 64bit data stateless messages
GBE: new Load/Store Instruction Selection pattern
OCL20/GBE: Fix 64bit pointer issue in Load store instruction selection.
ocl20/runtime: take the first 64KB page table entries.
ocl20/GBE: support generic load/store
utest: add generic pointer test
GBE: Implement new constant solution for ocl2
GBE: Implement to_local/private/global() function
libocl: add get_fence() builtin.
GBE: Fix type mismatch bug.
GBE: Fix SEL.bool issue.
GBE: add ocl 2.0 work_group_barrier support.
GBE: Fix bug when unspill a long type value from scratch.
GBE: don't try to erase a llvm:Constant.
GBE: the dst grf should use same width as source register
GBE: retype double register to long type when do spilling.
runtime: prog->global_data may get 64bit address
GBE: imm64 should not be in src1 per hardware spec.
GBE: handle ConstantExpr in program-scope variable handling.
GBE: Refine program scope variable logic.
GBE: Fix destination grf register type for cmp instruction.
runtime: handle PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE
GBE: Fix another Sel.bool issue.
Yan Wang (4):
Fix bug: Initialize bti of LoadInstuctionPattern::shootByteGatherMsg().
Fix getting bitwidth of PointerType of LLVM.
Restore jump threading pass for reducing compiling time when run the large and complex kernel like Luxmark.
Avoid possible invalid pointer by vector interator.
Yang Rong (36):
Docs: update readme.
Bump version to 1.3.
Docs: update a readme typo.
GBE: fix uninitialized build warning.
GBE: fix half immediate negate assert.
GBE: Fix assert when get metadata llvm.loop.unroll.enable.
GBE: Fix a logical insn with flag bug.
NEWS: Update Release 1.2.1.
OCL20/GBE: Change the pointer relative op's type.
OCL20: Add svm support.
OCL20: Add OpenCL2.0 apis to icd.
OCL20: add svm enqueue apis and svm's sub buffer support.
OCL20: add gbe_kernel_get_ocl_version for getting kernel's version in runtime.
libocl: change prototype of vload/vstore to match ocl2.0 spec.
add opencl builtin atomic functions implementation.
utest: add atomic opencl-2.0 case to test api.
OCL20: Fix svm bugs
OCL20: Implement clSetKernelExecInfo api
Libocl: change prototype of math built-in for OCL2.0 spec
OCL20: fix a unpack long assert.
Runtime: Fix vme fail.
Refine clSetMemObjectDestructorCallback API.
GBE: reorder the LLVM pass to reduce the compilation time.
GEB/Runtime: eliminate release build warnings.
utest: suspend deprecated-declarations warning.
Add the NULL pointer check.
GBE: correct the llvm.loop.unroll.enable meta.
Runtime: add the head file to avoid implicit declaration of function 'cl_devices_list_include_check' warning.
Runtime: fix a profiling fail.
utest: fix i386 system long ctz fail.
GBE: fix long work group fail.
Runtime: Fix a event bug.
GBE: if PointerFamily is FAMILY_QWORD, chv and bxt need special handle.
GBE: fix legacy read64 mix pointer bug.
GBE: fix a mix analyze bug.
Add some pointer access check.
Yang, Rong R (23):
KBL: fix some 1d array test fail.
Runtime: avoid clang warning "warning: expression result unused".
Add new BXT and KBL pciids to GetGenID.sh.
GBE: fix ctz fail.
Runtime: fix clEnqueueMigrateMemObjects fail.
GBE: don't use call->getCalledFunction() to decide the materialize function.
GBE: remove image type's access qual from image type name.
Runtime: fix fill image event assert and some SVM rebase error.
OCL20: Add read_write image type of image apis.
OCL20: add beignet_20.pch and beignet_20.bc.
OCL20: Add __OPENCL_VERSION__ and CL_VERSION_2_0 define.
OCL20: enable -cl-std=CL2.0.
OCL20: Add generic address space memcpy and memset.
GBE: fix a src/dst register reuse bug.
OCL20: add device enqueue helper functions in backend.
OCL20: add device enqueue builtins.
OCL20: add ir register enqueuebufptr for enqueue global buffer.
OCL20: handle device enqueue helper functions in the backend.
OCL20: Add runtime functions to get the device enqueue info.
OCL20: add a cl_kernel pointer to gpgpu.
OCL20: handle device enqueue in runtime.
OCL20: add device enqueue test case.
CMake: add an option to enable OpenCL 2.0.
Zhigang Gong (1):
CL: update to 2.0 header files.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/beignet/attachments/20170120/37f9d2c3/attachment-0001.html>
More information about the Beignet
mailing list