[Beignet] [ANNOUNCE] Beignet 1.3.0

Yang, Rong R rong.r.yang at intel.com
Fri Jan 20 09:35:52 UTC 2017


Beignet 1.3.0

========================

Beignet version 1.3 has been released. This is a major release of Beignet. This release include lots of improvements. The most important one is complete OpenCL 2.0 support. From 6th generation Intel Core Processors, include Skylake, Kabylake and Apollolake,  OpenCL 2.0 support can be turned on or off with build. When OpenCL 2.0 support is turned on with build, Beignet complies with OpenCL 2.0 spec. For more OpenCL 2.0 information, please refer to the README. Another improvement is runtime driver's refinement. Beignet re-implement the event module and enqueue module, make them more modular and structured. Beignet supported more extensions, speeded up kernel compiling time and improved the performance in this release.
The highlighted improvements are as below:
1. OpenCL 2.0 support.
2. OpenCL event and enqueue module re-implement.
3. Other OpenCL runtime driver refine.
4. LLVM 3.9 support.
5. Extension cl_khr_gl_sharing support.
6. Extension intel_subgroups_short support.
7. Large kernel compiling speed up.
8. Register allocation improvement.
9. Bugs fix.
Git tag: Release_v1.3.0
Gitweb URL: http://cgit.freedesktop.org/beignet
https://01.org/sites/default/files/beignet-1.3.0-source.tar.gz
md5sum: ff4b5f66fc66649aef883e5602d0a3b1  beignet-1.3.0-source.tar.gz
sha1sum: e77f7bcca16e3f19066a7335876b7ba3ffc3ee39  beignet-1.3.0-source.tar.gz
sha256sum: 63d98b4fe8fba3dbc0299d29fef84560625e5ac51b16b8fed453021d4afb5cd5  beignet-1.3.0-source.tar.gz
-----------------------------------------------------------------
Changes since 1.2.0:
Armin K (1):
      buildsys: Use CMRT_LIBDIR instead of CMRT_LIBRARY_DIRS
Chuanbo Weng (3):
      Runtime: re-enable cl_khr_gl_sharing with existing egl extension.
      rumtime: check all the extension id, not only BASE and OPT1.
      runtime: set cl_intel_motion_estimation as IVB specifc device extension.
Giuseppe Bilotta (2):
      Fix shift-overflow warning
      toMB: use standard constant
Guo Yejun (12):
      fix the condition to check if there are built-in kernels
      use OCL_MAP_BUFFER_GTT to map climage
      avoid too many messages when the driver could not find good values for local_size
      fix w of image when simulate image1dbuffer with image2d
      add another broxton pciid 0x5A85
      enlarge stack size for chv since its EU might be masked
      enlarge scratch size for bxt 0x5a85
      add bxt with pciid 0x1A84
      correct the kernel name
      add bxt with pciid 0x1A85
      change PCI_CHIP_BROXTON_P to PCI_CHIP_BROXTON_0 to unify the naming
      fix UNTYPED_WRITE function parameters for Gen75Encoder::UNTYPED_WRITE
Guo, Yejun (21):
      fix build issue when HAS_BO_SET_SOFTPIN is false
      remove some redundant code for printf
      do not care dst for printf
      do not touch src1 when setting instruction header
      prepare gen9 sends binary format and enable the ASM dump for sends
      support sends (split send) for untyped write
      revert clCreateCommandQueue* from ocl2.0 back to 1.2 in utests
      move function setDPByteScatterGather into class GenEncoder
      add sends support for byte write
      disable CMRT as default, since no real case reported
      save host_ptr when create sub buffer from CL_MEM_ALLOC_HOST_PTR
      enable sends for skl
      refine code to change insn.extra.splitSend as encoder funtion parameter
      support sends for long write
      add sends for atomic operation, only for ocl 1.2
      refine code starting from header in typedwrite
      enable sends for typed write
      output more detail of GEN IR for workgroup op
      add sends support for oword/media block write
      enable sends to write SLM for workgroup op
      add sends support for printf
Igor Gnatenko (1):
      Fix build with latest libdrm
Jan Vesely (3):
      api: check kernel parameter before accessing it
      tests: Use python2 explicitly
      libocl: Provide specs required CL_VERSION macros
Junyan He (51):
      Runtime: Add CL base object for all cl objects.
      Runtime: Apply CL base object to program.
      Runtime: Apply base object to cl_platform_id
      Runtime: Apply base object to cl_device_id
      Runtime: Apply base object ot cl_sampler.
      Runtime: Apply base object to cl_mem.
      Runtime: Apply base object to cl_event
      Runtime: Apply base object to cl_context
      Runtime: Apply base object to cl_command_queue.
      Runtime: Apply base_object to cl_kernel
      Runtime: Apply base object to cl_accelerator_intel
      Add list operation to utils.
      Add WAIT_ON_COND and WAIT_ON_COND to base object.
      Delete all the verbose locks and use list to store CL objects.
      Add command queue's enqueue thread.
      Implement event related functions.
      Modify all event related functions using new event handle.
      Add ref check for CL object's validation.
      Fix bugs in utest for event.
      Add a multi-queue utest.
      Delete useless cl_thread files.
      Fix a bug for event error status.
      Fix a bug for double free of enqueueNativeKernel.
      Add error handle for command queue destroy.
      Delete useless event list in command queue struct.
      Add a helper function for all information get.
      Modify clGetEventInfo using cl_get_info_helper.
      Modify clGetPlatformInfo using cl_get_info_helper.
      Modify clGetKernelInfo using cl_get_info_helper.
      Modify clGetCommandQueueInfo using cl_get_info_helper.
      Modify clGetContextInfo using cl_get_info_helper.
      Modify clGetDeviceInfo using cl_get_info_helper.
      Modify clGetSamplerInfo using cl_get_info_helper.
      Modify program Info using cl_get_info_helper.
      Modify clGetMemObjectInfo using cl_get_info_helper.
      Modify clGetImageInfo using cl_get_info_helper.
      Add helper functions for device list check.
      Refine create context APIs.
      Add multi devices support in context.
      Refine clRetain/Release MemObject
      Refine clCreateSampler API.
      Refine retain/release sampler API
      refine clCreateCommandQueue and clRetainCommandQueue.
      Move Device related APIs to new file
      Move clCreateCommandQueueWithProperties API to command_queue file.
      Utest: Refine half and float convert functions.
      Refine list related functions.
      Add profiling feature based on new event implementation.
      Improve event execute function.
      Fix two bugs about event.
      Fix a event notify bug.
Luo Xionghu (12):
      add atomic operators output for GEN_IR and gen disa.
      gbe: add AtomicA64 instructions with stateless access.
      support generic atomic.
      utest: add generic atomic test.
      cl_mem_fence_flags definiton change from MACRO to enum
      gbe: atomic_long type support.
      address bits change to 64.
      Runtime: Add API clCreateCommandQueueWithProperties
      atomic_flag_test_and_set function fix.
      gbe: use kernel_arg_base_type to recognize image arguments.
      gbe: add vec_type_hint's type into functionAttributes.
      atomic bug fix.
Mark Thompson (1):
      Apply image offset to read/write/map operations
Meng Mengmeng (3):
      Runtime: return CL_INVALID_EVENT_WAIT_LIST if not event in the wait list.
      eliminate build warnings in i386 system.
      Runtime: Use cl_ulong as CL_DEVICE_MAX_MEM_ALLOC_SIZE's return type.
Pan Xiuli (70):
      Backend: Refine block_read buffer with unaligned OWord block read
      Utest: Add test for half type subgroup functions
      Backend: Fix printf bug for simd8
      Runtime: Fix null device for clGetKernelWorkGroupInfo
      Libocl: Add define for cl_intel_subgroups
      Backend: Resize the selection instruction max dst num
      Backend: Refine image block read with less vector and dst tmp
      Backend: Fix simd id will broke in simd8 mode
      Utest: Fix sub group broadcast for simd8
      Backend: Fix simd shuffle base address
      Utest: Fix sub group shuffle for simd8
      Backend: Fix bug for sub/work group functions
      Libocl: Fix get_sub_group_size bug
      Backend: Refine gen ir ALU1 inst getType
      Utest: Change the kernel index to fit case index
      Runtime: Fix accesss quilifer for internal kernels
      Libocl: Image should have access qualifier
      Utest: read/write_only qualifier should only used with image.
      Utest: Remove load spir test
      Backend: Add support for LLVM 3.9 release
      Backend: Refine GenRegiter::offset
      Backend: Refine register offset for simd shuffle
      Backend: Refine sub group broadcast code for spec
      Libocl: Add sub group broadcast short builtin function
      Utest: Add check subgroup short helper function
      Utest: Add test case for sub group broadcast short
      Backend: Change the sel ir optimization for unpack register
      Backend: Add short sub group builtin functions
      Utest: Add test case for sub group short builtin functions
      Backend: Add sub groups short shuffle builtin functions
      Utest: Add test case for short type sub group shuffle
      Backend: Add subgroup short block read/write
      Utest: Add subgroup block read/write ushort test case
      Backend: Add A64 subgroup block read/write support
      Libocl: Add intel_subgroups_short extension
      Backend: Add built-in ctz function
      Utest: add a test case for built-in ctz function
      Runtime: Add clCreateSamplerWithProperties
      Utest: Add sampler test
      Runtime: Add support of OCL2.0 device queries
      Runtime: Add extensions for OCL20
      Runtime: Add pipe related APIs
      Backend: Add Pipe Builtin support
      Backend: Add pipe packet size check
      Utest: Add pipe related test
      Runtime: Add support for sRGB
      Runtime: Refine clGetSupportedImageFormats to support CL_MEM_FLAGS
      Runtime: Add suport for sRGB to clEnqueueCopyImage
      Runtime: Add suport for sRGB to clEnqueueFillImage
      Runtime: Add support for clGetMemObjectInfo
      Backend: Refine get_enqueued_local_size and get_local_size
      Runtime: Add support for non uniform group size
      Backend: Clang now support static, fix now
      libocl: Refine return type of workitem built-in functions
      Backend: Chang scan limit for GVN pass
      Runtime: Add support for queue size and fix error handling
      Backend: Add RegisterFamily for ir
      Backend: Initialize the extra value for selection instruction
      Backend: Fix GenRegister::offset sub reg offset
      Backend: Refine flag usage in instrction selection
      Backend: Add kernel name for sel ir output
      Backend: Refine instruction ID for sel ir
      Backend: Refine selection IR output
      Backend: Refine block read/write instruction selection
      Backend: Fix some A64 block read/write bug
      CMake: Add OCL20 env for utest
      Backend: Fix sel ir subnr usage
      Backend: Fix header address of oword block read/write
      GBE: Fix memdep-block-scan-limit caused bug on LLVM3.8
      GBE: Fix getTypesize bug with LLVM3.9
Rebecca N. Palmer (10):
      Allow building tests with Python 3 (no string.atoi)
      Utest: test pow, not powr, on negative x
      Docs: Spelling and grammar fixes
      Utests: use clGetExtensionFunctionAddressForPlatform
      Utests: Don't end an all-tests run when one test fails
      Utests: respect existing C/CXXFLAGS
      Fix build failure with CMRT enabled
      Utests: Allow testing cl_intel_accelerator via ICD
      Add clGetKernelSubGroupInfoKHR to _cl_icd_dispatch table
      Fail, don't assert, if unable to create context
Ruiling Song (25):
      GBE: add untyped A64 stateless message
      GBE: add byte scatter a64 message
      GBE: Add 64bit data stateless messages
      GBE: new Load/Store Instruction Selection pattern
      OCL20/GBE: Fix 64bit pointer issue in Load store instruction selection.
      ocl20/runtime: take the first 64KB page table entries.
      ocl20/GBE: support generic load/store
      utest: add generic pointer test
      GBE: Implement new constant solution for ocl2
      GBE: Implement to_local/private/global() function
      libocl: add get_fence() builtin.
      GBE: Fix type mismatch bug.
      GBE: Fix SEL.bool issue.
      GBE: add ocl 2.0 work_group_barrier support.
      GBE: Fix bug when unspill a long type value from scratch.
      GBE: don't try to erase a llvm:Constant.
      GBE: the dst grf should use same width as source register
      GBE: retype double register to long type when do spilling.
      runtime: prog->global_data may get 64bit address
      GBE: imm64 should not be in src1 per hardware spec.
      GBE: handle ConstantExpr in program-scope variable handling.
      GBE: Refine program scope variable logic.
      GBE: Fix destination grf register type for cmp instruction.
      runtime: handle PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE
      GBE: Fix another Sel.bool issue.
Yan Wang (4):
      Fix bug: Initialize bti of LoadInstuctionPattern::shootByteGatherMsg().
      Fix getting bitwidth of PointerType of LLVM.
      Restore jump threading pass for reducing compiling time when run the large and complex kernel like Luxmark.
      Avoid possible invalid pointer by vector interator.
Yang Rong (36):
      Docs: update readme.
      Bump version to 1.3.
      Docs: update a readme typo.
      GBE: fix uninitialized build warning.
      GBE: fix half immediate negate assert.
      GBE: Fix assert when get metadata llvm.loop.unroll.enable.
      GBE: Fix a logical insn with flag bug.
      NEWS: Update Release 1.2.1.
      OCL20/GBE: Change the pointer relative op's type.
      OCL20: Add svm support.
     OCL20: Add OpenCL2.0 apis to icd.
      OCL20: add svm enqueue apis and svm's sub buffer support.
      OCL20: add gbe_kernel_get_ocl_version for getting kernel's version in runtime.
      libocl: change prototype of vload/vstore to match ocl2.0 spec.
      add opencl builtin atomic functions implementation.
      utest: add atomic opencl-2.0 case to test api.
      OCL20: Fix svm bugs
      OCL20: Implement clSetKernelExecInfo api
      Libocl: change prototype of math built-in for OCL2.0 spec
      OCL20: fix a unpack long assert.
      Runtime: Fix vme fail.
      Refine clSetMemObjectDestructorCallback API.
      GBE: reorder the LLVM pass to reduce the compilation time.
      GEB/Runtime: eliminate release build warnings.
      utest: suspend deprecated-declarations warning.
      Add the NULL pointer check.
      GBE: correct the llvm.loop.unroll.enable meta.
      Runtime: add the head file to avoid implicit declaration of function 'cl_devices_list_include_check' warning.
      Runtime: fix a profiling fail.
      utest: fix i386 system long ctz fail.
      GBE: fix long work group fail.
      Runtime: Fix a event bug.
      GBE: if PointerFamily is FAMILY_QWORD, chv and bxt need special handle.
      GBE: fix legacy read64 mix pointer bug.
      GBE: fix a mix analyze bug.
      Add some pointer access check.
Yang, Rong R (23):
      KBL: fix some 1d array test fail.
      Runtime: avoid clang warning "warning: expression result unused".
      Add new BXT and KBL pciids to GetGenID.sh.
      GBE: fix ctz fail.
      Runtime: fix clEnqueueMigrateMemObjects fail.
      GBE: don't use call->getCalledFunction() to decide the materialize function.
      GBE: remove image type's access qual from image type name.
      Runtime: fix fill image event assert and some SVM rebase error.
      OCL20: Add read_write image type of image apis.
      OCL20: add beignet_20.pch and beignet_20.bc.
      OCL20: Add __OPENCL_VERSION__ and CL_VERSION_2_0 define.
      OCL20: enable -cl-std=CL2.0.
      OCL20: Add generic address space memcpy and memset.
      GBE: fix a src/dst register reuse bug.
      OCL20: add device enqueue helper functions in backend.
      OCL20: add device enqueue builtins.
      OCL20: add ir register enqueuebufptr for enqueue global buffer.
      OCL20: handle device enqueue helper functions in the backend.
      OCL20: Add runtime functions to get the device enqueue info.
      OCL20: add a cl_kernel pointer to gpgpu.
      OCL20: handle device enqueue in runtime.
      OCL20: add device enqueue test case.
      CMake: add an option to enable OpenCL 2.0.
Zhigang Gong (1):
      CL: update to 2.0 header files.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/beignet/attachments/20170120/37f9d2c3/attachment-0001.html>


More information about the Beignet mailing list