[Beignet] [ANNOUNCE] Beignet 1.2.0

Yang, Rong R rong.r.yang at intel.com
Tue Aug 30 09:07:24 UTC 2016


Beignet 1.2.0

========================

Beignet development team is pleased to announce that Beignet version 1.2.0 has been released. In this release, Beignet continue to improvement the quality, performance and debug support. In this release, Beignet add Broxton and Kabylake platforms support, and new llvm version support, includ llvm3.7 and llvm 3.8. In this release, Beignet enable debug info when output ASM and also add profiling support. Meanwhile, Beignet add more and more extensions and features, such as full cl_intel_subgroups extension, intel_accelerator and basic intel_motion_estimation and so on. The performance is also improved, especially for some algorithm, for example GEMM algorithm have about 2x gain.

The highlighted improvements are as below:

1. 6th generation Intel Atom Processors (Broxton) support.
2. 6th generation Intel Core Processors (Kabylake) support.
3. LLVM 3.7 and 3.8 support.
4. ASM debug infomation and profiling support.
5. Experimental double data type support for processors after  6th generation(Broadwell) .
6. Full cl_intel_subgroups extension.
7. OpenCL 2.0 workgroup built-in functions.
8. Local copy propagation optimization and other optimizations
9. intel_accelerator extension and basic intel_motion_estimation extension.
10. Android build.
11. Refined printf implementation.
12. Bug fixes since last release.

Git tag: Release_v1.2.0
Gitweb URL: http://cgit.freedesktop.org/beignet
https://01.org/sites/default/files/beignet-1.2.0-source.tar.gz

md5sum: f25278189a0a713380a662f883ecb00d beignet-1.2.0-source.tar.gz
sha1sum: 33136e80d2a2b725a37f1ee3ea7c655a3f6eb5af  beignet-1.2.0-source.tar.gz
sha256sum: fc7af19efb7596b04510d26c558a576eba3e95e1ef86fd6951213c6a4bf58bff  beignet-1.2.0-source.tar.gz

-----------------------------------------------------------------
Armin K (3):
      Fix build with clang++
      RFC: Fix building with clang++ -stdlib=libc++
      Fix building with clang++

Bai Yannan (5):
      GBE/DebugInfo: Enable new feature
      GBE/DebugInfo: Pass debug info :llvm IR => GEN IR
      GBE/DebugInfo: Pass debug info : GEN IR => SEL IR
      GBE/DebugInfo: Pass debug info : SEL IR => GenInsn
      GBE/DebugInfo: Print line and column NO. with ASM

Chuanbo Weng (10):
      Add extension clCreateBufferFromFdINTEL to create cl buffer by external buffer object's fd.
      Add extension clCreateImageFromFdINTEL to create cl image by external fd.
      Add built-in function __gen_ocl_vme.
      Add extensions intel_accelerator and basic intel_motion_estimation.
      Add basic utest for block_motion_estimate_intel.
      Add document of video motion estimation support.
      Full support of cl_intel_motion_estimation extension.
      runtime: The depth should be 1 for CL_MEM_OBJECT_IMAGE2D in beignet's implementation.
      Runtime: fix one typo error.
      Runtime: set size member of cl_image created by clCreateImageFromFdINTEL.

Francisco Jerez (1):
      SKL: Use kernel-defined MOCS values instead of assuming hardware defaults.

Frank Dittrich (1):
      Backend: Fix memleak in serialize_program

Giuseppe Bilotta (2):
      Fix sizing error for bitfield
      Increase size for compile log output

Grigore Lupescu (16):
      Backend: Workgroup reduce redesign using shared local memory
      Backend: Support for workgroup op predicate, scan inclusive/exclusive
      Backend: Initial support for long/ulong types in workgroup ops
      Backend: Fix barrier placement in workgroup functions
      Backend: Workgroup scan cmp qword, workaround for execution width 16
      Backend: Full support for workgroup broadcast
      Utest: Add workgroup reduce any/all tests
      Utest: Add workgroup scan exclusive tests
      Utest: Add workgroup scan inclusive tests
      Utest: Add workgroup broadcast tests
      Benchmark: Add performance tests for workgroup reduce/scan functions
      Benchmark: Add performance tests for workgroup broadcast
      Benchmark: Evaluate math performance on intervals
      Backend: Optimization internal math, lower polynomials
      Backend: Optimization internal math, use native
      Backend: Optimization internal math, use mad

Guo Yejun (52):
      Use Integer for U32/S32 Immediate load.
      generate sub_group_id inside kernel instead of payload
      fix issue when build against llvm3.3
      remove GBE_CURBE_STACK_POINTER in payload
      correct simd width when dst of simd_shuffle is scalar
      generate MOV instruction at selection stage when do simd_shuffle with imm value.
      add basic function to dump Selection IR
      add basic structure for selection IR optimization
      add local copy propagation optimization for each basic block
      refine code to separate the usage of data and image2d_from_buffer
      enable USE_HOST_PTR for cl image with userptr to avoid extra copying
      add utest runtime_use_host_ptr_image
      fix uniform case for ByteGather
      add conditions of pitch and h to enable userptr for climage_use_host_ptr
      fix a regression issue caused by LocalCopyPropagation
      fix a long relative regreesion issue on BSW caused by local copy propagation
      add comments to explain 32bit is enough to represent w+hstrid+vstride
      fix regression issue for climage + uesrptr
      add more OP for LOGICAL_SRCMOD case
      make Beignet as intermedia layer of CMRT
      add utest to demo how to run CM kernerl via OpenCL APIs
      add Broxton support
      add support for build option -cl-fast-relaxed-math
      output warning message if do not find a good local_work_size
      fix a bug when the first operand of intel_sub_group_shuffle is uniform
      considering width and hstride when do unpacked_uw
      do not call memcpy for cl_enqueue_read_buffer if userptr is enabled
      change built-in function name from get_sub_group_size to get_max_sub_group_size
      change built-in function name from get_sub_group_id to get_sub_group_local_id
      fix typo for DEBUGP to avoid print extra empty line
      correct ASM output for byte scattered read/write
      correct the dst type to ud instead of uw for byte scattered read
      fix a potential issue of SEL IR optimization when subphysical is true
      enable byte gather for vload2/3/4/8(offset, char*) on SKL, BXT and BDW
      enable FP_CONTRACT on as default, and implemented with MAD
      change behavior of mul24/mad24 when out of range
      utest: do not check MV near image border
      set SIMD width as 1 for mad when the dst is uniform
      only release cmrt device when it is already created
      output message instead of assert when .bc file does not exist
      utests: add access qualifier for image in kernel
      enlarge buf size to avoid memory out of range written by GPU (kernel)
      do not use const pointer
      change the bahavior when writing to 3-component vector data types
      utest: do not check the padding componenet for 3-component vector data types
      utests: fix issue of CL_PROGRAM_BINARY_SIZES query
      utests: change tolerance check for lgamma
      utests: only check -dump-spir-binary on beignet implementation
      remove "\n" in output message when test is failed
      use different pointer alignment for different implementation
      only check beignet special test cases on beignet
      add help for 'make package'

Jan Vesely (1):
      cl_api; Check image origin and region for NULL

Junyan He (68):
      Runtime: Add NULL pointer check in clGetKernelArgInfo
      Utest: Add -cl-kernel-arg-info to the utest test_get_arg_info
      Backend: Refine ConvertInstruction logic in insn_selection
      Backend: Fix the bug for double imm reg.
      Backend: Redefine double register pattern.
      Backend: Add FDIV64 function for gen_insn_selection.
      Backend: Add gen8 instruction field for special accumulator.
      Backend: Delete getDoubleExecWidth and refine handleDouble.
      Backend: Fix a bug for double imm src setting.
      Backend: Add MATH_WITH_ACC function.
      Backend: Add the MADM function to gen8 encoder.
      Backend: Implement FDIV64 on BDW.
      Backend: Add madm and invm instrucions to disasm.
      Runtime: Refine ext enable function for platform.
      Utests: Add double check and refine compiler_double case.
      Utest: Add double division test.
      Backend: Delete the useless MOV_DF instruction.
      Backend: Delete LOAD_DF_IMM instruction.
      Backend: Add double conversion to insn selection.
      Utests: Add test cases for double conversion.
      Utest: Fix a bug for double div.
      Backend: Fix a potential bug for uniform conversion.
      Backend: Fix half->long convertion bug for BSW.
      CMake: Add -lrt to the link command of libcl.so
      Backend: Add ProfilingInfo class to ir.
      Backend: Add StoreProfiling and CalcTimestamp instructions
      Backend: Add ProfilingInserter and a new function pass.
      Backend: Add profiling registers to curbe.
      Backend: Add ProfilingInfo to Unit.
      Backend: Insert store_profiling before lowed return.
      Backend: Add CalcTimestamp and StoreProfiling.
      Backend: Add IVAR OCL_PROFILING_LOG to control profiling log.
      Backend: Add CalcTimestamp and StoreProfiling to insn selection.
      Backend: Add a auxiliary function to convert GenReg to uniform.
      Backend: Add tm0 function for arf timestamp register.
      Backend: Add profilingProlog function for GenContext.
      Add profiling info APIs to runtime.
      Runtime: Bind the profiling buffer when profiling enabled.
      Backend: Fix two bugs about curbe related pointer.
      Backend: Avoid CALC_TIMESTAMP and STORE_PROFILING being scheduled.
      Backend: Add ADD_ and SUB_ timestamps help functions.
      Backend: Implement emitCalcTimestampInstruction in GenContext.
      Backend: Implement StoreProfilingInstruction in GenContext.
      Backend: Append the reg interval for registers need for profiling.
      Utests: Fix the failure for half math tests.
      libocl: Add the module for work_group functions.
      Add the WorkGroupInstruction as a new type of instruction.
      Add WorkGroup functions to Gen IR logic in llvm_gen_backend.
      Handle the WorkGroup_Broadcast logic in insn_selection.
      Add utest for workgroup_broadcast.
      Backend: Add sr0 reg helper function.
      Backend: Add tidMapSLM and wgBroadcastSLM to each function.
      Backend: Add threadid as a curbe register.
      libocl: Refine the workgroup functions, add signed info.
      Backend: Establishing the thread/TID-EUID map.
      Add forward message function for gen encoder.
      Backend: Add WORKGROUP_OP instruction selection.
      Backend: Add state register into schedule consideration.
      Backend: Implement reduce min and max in gen_context
      Runtime: Add the threadid calculation for curbe.
      Utests: Add test cases for workgroup reduce max/min.
      Backend: Add reduce add to gen_context.
      Utests: Add test cases for reduce add.
      Backend: Fix a memory leak for structurizer.
      Backend: Use KernelArgument::ArgInfo to replace llvm's arg info.
      Add the serializeToBin and deserializeFromBin for kernel arg info.
      Add several printf utest cases.
      Fix the bug when we pass argument with spaces.

Laura Ekstrand (7):
      backend/src/backend: Handle -dump-opt-llvm=[PATH]
      backend: Handle (but ignore) -dump-opt-asm=[PATH].
      backend: Move ASM printing to a helper function.
      backend: Convert outputAssembly to C file I/O.
      backend, src: Add ASM file name to gbe_program_new_from_llvm
      backend: Add ASM file name to GenProgram object.
      backend: Add ASM file name to GenContext object.

Luo Xionghu (62):
      libocl: fix degrees function precision issue.
      Update last event status in clFinish.
      fix utest fail.
      GBE/IR: add collectInsnNum to collect block instruction number.
      GBE/PRINTF: store variable instead of pointer in "slots".
      should check the return value of cl_program_new.
      return 32 could gain 0.2% performance on opencv optical flow case.
      enable create image 2d from buffer in clCreateImage.
      add utest for creating 2d image from buffer.
      fix bswap bug.
      add bswap64 for gen7/gen75 and gen8 seperately.
      add bswap64 in utest.
      reset the variables in printf_paser to NULL.
      using name instead of index to query from ConstantSet.
      alignment of NO TILING surface limitation shouldn't be removed.
      Revert "return 32 could gain 0.2% performance on opencv optical flow case."
      pitchalignment should be set to 1.
      use sampler to copy image_from_buffer to another image for verification.
      use table to define and query binary headers.
      set the pitch of image from buffer to the buffer's pitch.
      runtime: extension size not enough.
      gbe: fix uitofp instruction issue.
      check image from buffer's base address alignment.
      gbe/libocl: define the vloada_xxx function instead of using MACRO.
      gbe: use kernel_arg_base_type to recognize image arguments.
      gbe/libocl: change xxx_fence function to OVERLOADABLE.
      runtime: initialize the memory content to 0.
      runtime: fix clCompileProgram bug.
      runtime: fix clLinkProgram bug.
      runtime: add missing supported format image_1d_buffer.
      gbe: add vec_type_hint's type into functionAttributes.
      gbe/libocl: define the gentype half_xxx math function instead of using MACRO.
      change the sampler type value to keep same with spir spec.
      fix gcc build error.
      backend: enable option -dump-spir-binary to generate SPIR binary from beignet.
      utest: add utest to generate spir binary from beignet.
      fix LLVM 3.5 fail.
      fix workgroup_broadcast instruction debug mode assert.
      fix debug instruction welform assert.
     should convert from llvm address to GEN address space to compare.
      add sanity check for Image Region in runtime.
      buf[0] destroyed twice in event cases.
      standalone utest for unified OpenCL implementation.
      fix failed cases for stand alone utest;
      add functions cl_check_beignet.
      handle simd8 and simd16 accrordingly for alu3.
      enable utest compiler_math_3op for mad test.
      assert equation issue.
      write mask in disassembly not parse correctly.
      3 op math functions dst need 16 byte align when allocate register.
      add howto for stand alone utest.
      gbe: ts array out of boundary.
      fix kernel build warnings.
      gbe/llvm: fix potential null pointer dereference.
      no return value for non-void function.
      runtime: error handling to avoid null pointer dereference.
      utest: error handling to avoid null pointer dereference.
      file name length overflow check.
      utest: init uninitialized local variables.
      gbe/ir: defaultData is not initialized by default.
      gbe/ir: initialize the InstructionBase by a straightforward way.
      runtime: thread_ids not initialized after created.

Manasi Navare (4):
      backend: Turn on ASM dump.
      backend/src/backend: Handle -dump-opt-llvm=[PATH] in clCompileProgram and clBuildProgram OpenCL API
      Add -dump-opt-asm support to the clLinkProgram() API
      utests: Added unit tests to test LLVM and ASM dump generation in a two step build process with clCompile and clLink APIs.

Meng Mengmeng (8):
      fix a powr function issue in cpu compiler math
      add utests option: -j which specifies the 'number' of jobs (multi-thread)
      add benckmark for copy data from buffer to buffer
      add benckmark for copy data from image to image
      Add a option which could set the benchmark unit properly.
      Refine the benchmark tests: copy buffer and image.
      Add a benchmark which test do 3*3 median filter in buffer.
      Add a benchmark which test do 3*3 median filter in image.

Midhun Kodiyath (2):
      Set proper Vendor ID
      Calculate appropriate timestamps for cl profile

Pan Xiuli (85):
      Fix clGetKernelArgInfo fail on piglit
      Driver: fix the annoying "Failed to release userptr..." error message
      Fix gpgpu node related bug
      runtime: refine the last_event in queue to a list
      Fix a event leak in create context
      utests: event should be released
      drivers: change the buf size to size_t
      runtime: refine the cl_device_id to support bigger memory
      driver: add setup_bti_gen9 for bigger buffer up to 4G
      runtime: dynamically get global memory size and max alloc size
      utests: fix multithread queue chaos
      utests: fix image_from_buffer bugs
      utests: fix compiler_fill_image_2d_array random bug
      Backend: enable to choose notification register
      Backend: add debugwait function
      Backend: Add gen9 barrier prediction setting
      Backend: Refine printfs into ir unit
      libocl: Add three work-item built-in function
      Utest: Add test for get_global/local_linear_id
      Backend: refine mix with hardware lrp function
      utests: add an utest for mix
      Backend: Implement the non-constant extractelement scalarize
      Utest: Add a bitonic sort test for non-constant extractelement
      Driver: Fix GPGPU delete bug
      Backend: Refine new instruction with IRBuilder create
      Backend: Add support for LLVM release 3.8
      Backend: Remove uselsee ParseCommandLineOptions
      Runtime: Add SKL device id for new SKL device
      Backend: Fix bug build with clang
      Add support for gcc 6
      Backend: Fix printfs mem leak
      Backend: Fix memleak form abi::__cxa_demangle
      Utest: Fix utest memleaks
      Runtime: Fix memleak of barrier evnets
      Runtime: Fix memleak in build program for bin
      Benchmark: Fix benchmark bugs with image map
      Benchmark: Fix Benchmark heap use after free problem
      Backend: Refine workgroup all with SIMD_ALL algorithm
      Backend: Copy workgroup emit function to gen8
      Utest: Remove some unsuport work group tests
      Backend: Add uncompatiblePCHOptions for OCL20
      Backend: Add workaround for instcombine will optimize fabs
      Runtime: Disable image hostptr for default
      Runtime: Fix thread id calculation.
      Runtime: Add API clGetKernelSubGroupInfoKHR for subgroup extension
      Backend: Add subgroup work item builtin functions
      Utest: Add subgroup work item test cases
      Backend: Refine return value of sub_group_all/any to 1
      Utest: Remove old sub_group_all/any utest
      Backend: Add sub_group built-in functions for intel extension
      Utest: Add test case for sub_group functions
      Backend: Add intel_sub_group_block_read/write form buffer
      Utest: Add tset case for block read/write buffer
      Backend: Add intel_sub_group_block_read/write form image
      Utest: Add tset case for block read/write image
      Utest: Porting to GCC6
      Libocl: Add define for new added cl_khr_3d_image_writes extension
      Runtime: Add intel_subgroups extensions
      Backend: Refine block read/write buffer
      Backend: Refine block read image
      CMAKE: Use DRM_INTEL_LIBDIR for CHECK_LIBRARY_EXISTS path
      Runtime: Add subgroup extension API in clGetExtensionFunctionAddress
      Utest: Add check for cl_intel_subgroups extension tests
      Backend: Add missing math function control code for gen8+
      Backend: Refine sel ir optimization
      CMAKE: Make utests and benchmark not build for default
      Backend: Change disable compact to compact version
      Backend: Add gen8+ instruction compact support
      Backend: Add intel_sub_group_shuffle_down/up/xor with shuffle
      Utest: Add test case for sub_group_shuffle_down/up/xor
      Utest: Add check for utest multithread run
      Utest: Add check for OpenCL 2.0 extension
      Backend: Fix image block write bug in simd8
      Utset: Add check for workgroup tests
      Runtime: Fix hang for bsw device with 12 EU
      Utest: Move half related helper function into utest helper
      Utest: Add as_float as_uint helper function
      Libocl: Add vload\store for half type
      Utest: Add test case for half type vload\store
      Libocl: Add half type dot
      Backend: Add half type support for sub group functions
      Backend: Add half type for mad
      Utest: Add half type mad test case
      Utest: Fix utest case with issues value
      Utest: Refine utest_run -l option

Rebecca N. Palmer (3):
      GBE: Don't read past end of printf format string
      FindLLVM: allow LLVM/Clang 3.7
      Report build failures in backend to the build log

Ross Burton (2):
      CMakeLists: respect existing CMAKE_C/CXX_FLAGS
      Ensure paths to beignet.bc and beignet.pch include a / before the filename

Ruiling Song (20):
      GBE: Fix build error.
      runtime: add Broadwell deviceID 0x162B
      runtime: add detailed broadwell device name.
      GBE: add check dumpASMFileName.empty()
      GBE: fix ub1grf(nr, subnr) issue.
      GBE: Minor refine uw1grf(nr, subnr).
      GBE: Implement liveness dump.
      GBE: Fix unaligned load/store issues.
      GBE: Refine ir for memory operation like atomic/load/store
      GBE: CreateCall2 is removed in llvm 3.7.
      utest: write to dst buffer to fix utest failure
      runtime: add macro DEBUGP() to handle debug printf.
      GBE: try to avoid bank conflict in register allocator.
      GBE: Do more aggressive load/store merging.
      GBE: Optimize extraLiveOut register info.
      GBE: Fix two bugs in loop preheader.
      GBE: Improve spill policy by considering use count.
      GBE: handle mad with execution width of one.
      GBE: Handle null and uninitialized pointer when do pointer/bti analysis.
      GBE: handle Instrinsic trap and unreachable instruction.

Sean Lynch (1):
      Prepend std namespace to isnan and isinf calls.

Sirisha Gandikota (1):
      utests: Added unit tests to test LLVM and ASM dump generation.

Steven Newbury (1):
      FindLLVM: check for empty system-libs variable

Yan Wang (15):
      Add condition checking of residuals because it may be NULL.
      Change printf data structure and remove old code.
      Add PrintfLog structure.
      Reconstruct printf parser.
      Add LLVM fcuntion definition of printf.
      Add tuple processing logic for printf.
      Add the implementation of printf ir instruction.
      Implement emision of printf instruction.
      Implement instruction selection of printf.
      Implement ASM generation of printf.
      Implement printf buffer management.
      Output printf result.
      Scalarize vector in printf.
      Remove unncessary assertion in printf processing.
      Add cl_khr_3d_image_writes into info string.

Yang Rong (48):
      Bump master version to 1.2.
      Update Release 1.1.0 NEWS.mdwn.
      Utest: fix a builtin_powr_float fail when OCL_STRICT_CONFORMANCE=0.
      Fix piglit clLinkProgram fail.
      Don't use cl_buffer_get_subdata in clEnqueueReadBuffer.
      Fix clLinkProgram error.
      Update NEWS.
      Revert "GBE: refine longjmp checking."
      GBE: use opencl c to implement llvm.memset and llvm.memcpy.
      GBE: Add llvm3.7 support.
      GBE: Move createStripAttributesPass before createInstructionCombiningPass.
      GBE: Add datalayout and triple to ll files.
      fix llvm3.7 compiler_function_qualifiers utest fail.
      Libocl: forgot to add memset.h.
      Utest: fix random assert in function cl_kernel_link.
      LibOcl: Fix float convert to long/ulong bug.
      Runtime: add CL_DEVICE_SPIR_VERSIONS to clGetDeviceInfo.
      Runtime: return the correct error code in cl_event_check_waitlist.
      Runtime: because double's built-ins haven't completely support, so disable it by default.
      GBE: fix a assert when structure argument's first field don't used.
      SKL: use the hw defautl value mocs index before linux 4.3.
      Android: change the saved file path.
      Android: erase the stl in iterator loop, must update the iterator.
      Android: derived the Allocator from std::allocator.
      Android: add android mk files.
      Android: fix __thread keyword issue in android.
      Android: disable pch in android.
      Android: workaround libdrm flags.
      Android: fix vector16 error in android.
      Changed ldexp to ldexpf when defining float min/max.
      Runtime: Use uanme to get kernel architecture.
      GBE: fix a patch JMPI assert.
      CMake: Fix a cmake warning.
      CMake: use CHECK_LIBRARY_EXISTS to check the function instead of version.
      GBE: fix a memset typo.
      GBE: fix caffe fft tests fail.
      Runtime: fix caffe segmentation fault when exit.
      GBE: change GEN binary format.
      GBE: warning when the GEN binary version mismatch.
      GBE: reorder the condition to avoid array overflow.
      Runtime: set the sub slice according to kernel pooled EU configure.
      README: remove "legacy Illegal pointer issue" section.
      FindLLVM: allow LLVM/Clang 3.8 and reorder the llvm-config priority.
      KBL: add kabylake pciids.
      KBL: add kabylake backend support.
      KBL: add kabylake runtime support.
      Docs: update readme.
      Bump to version 1.2.0.

Yang, Rong R (5):
      GBE: remove stacksize 64KB limitation.
      Runtime: fix a userptr bug.
      GetGenID: add the miss pci ids.
      Runtime: fix a string overflow.
      Runtime: Add assert of pthread_getspecific.

Zhenyu Wang (2):
      First reference beignet's CL header to build
      Flush kernel source dump stream

Zhigang Gong (37):
      runtime: always try to update event status in clGetEventProfilingInfo().
      GBE: fix the broken image_1d_buffer write.
      utests: refine image 1d buffer test case.
      GBE: one minor bug in OP_SIMD_XXX.
      GBE: a potential bug in instruction scheduling.
      GBE: Use addRemappedFile to avoid creating temporary cl source file.
      GBE: fix build error with LLVM 3.5 and previous version.
      GBE: refactor curbe register allocation.
      GBE: refine longjmp checking.
      GBE: don't treat btiUtil as a curbe payload register.
      GBE: don't always allocate ir::ocl::one/zero
      GBE: we no longer need to allocate register from two directions.
      GBE: refine longjmp checking.
      GBE: refine Phi copy interfering check.
      GBE: refine liveness analysis.
      GBE: add two helper routines for liveness partially update.
      GBE: add some dag helper routines to check registers' interfering.
      GBE: implement further phi mov optimization based on intra-BB interefering analysis.
      GBE: continue to refine interfering check.
      GBE: Don't try to remove instructions when liveness is in dynamic update phase.
      GBE: enable post phi copy optimization function.
      GBE: avoid vector registers when there is high register pressure.
      GBE: fix a zero/one's liveness bug.
      GBE: fix kernel arguments uploading bug.
      GBE: fix a regression bug at post phi copy optimization.
      GBE: extent register allocator size/offset to 32bit.
      GBE: don't assert even if we fail to compile kernel at the backend stage.
      GBE: remove useless assertions code.
      GBE: decrease the loop unrolling threshold to 640.
      runtime: set CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE to kernel's SIMD_WIDTH.
      GBE: implement pre-register-allocation instruction scheduling.
      Revert "GBE: disable mad for some cases."
      Refine custom unrolling policy.
      GBE: disable the read byte as DW.
      Android: work around a LLVM 3.5 unrolling bug.
      Remove nonexisting unit test cases in Android.mk.
      update android version.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/beignet/attachments/20160830/3fd4de6f/attachment-0001.html>


More information about the Beignet mailing list