Jason Ekstrand jason at jlekstrand.net
Wed Oct 25 23:25:31 UTC 2017

This series is a third respin of my subgroups prerequisites series that
that I sent out a few weeks ago.  Not a whole lot has changed but there are
some new patches.  Primarily,

 1) Some patches which were reviewed by Matt and Lionel were pushed and are
    no longer in the series.  Thanks guys!

 2) I've applied R-B tags from various people for patches which are
    reviewed but depend on still unreviewed patches.

 3) A few patches to fix little-core.  In particular, the extra little-core
    EU restrictions cause problems for BROADCAST, MOV_INDIRECT, and integer

This series can be found on fd.o nere:


Happy reviewing!

Cc: Matt Turner <mattst88 at gmail.com>
Cc: Francisco Jerez <currojerez at riseup.net>
Cc: Connor Abbott <cwabbott0 at gmail.com>

Francisco Jerez (1):
  intel/fs: Restrict live intervals to the subset possibly reachable
    from any definition.

Jason Ekstrand (47):
  intel/fs: Pass builders instead of blocks into emit_[un]zip
  intel/fs: Be more explicit about our placement of [un]zip
  intel/fs: Use ANY/ALL32 predicates in SIMD32
  intel/fs: Don't stomp f0.1 in SIMD16 ballot
  intel/fs: Use an explicit D type for vote any/all/eq intrinsics
  intel/fs: Use a pair of 1-wide MOVs instead of SEL for any/all
  intel/compiler: Add some restrictions to MOV_INDIRECT and BROADCAST
  intel/eu: Just modify the offset in brw_broadcast
  intel/eu/reg: Add a subscript() helper
  intel/eu: Fix broadcast instruction for 64-bit values on little-core
  intel/fs: Fix MOV_INDIRECT for 64-bit values on little-core
  intel/fs: Fix integer multiplication lowering for src/dst hazards
  intel/fs: Use the original destination region for int MUL lowering
  i965/fs: Extend the live ranges of VGRFs which leave loops
  i965/fs/nir: Simplify 64-bit store_output
  i965/fs: Return a fs_reg from shuffle_64bit_data_for_32bit_write
  i965/fs/nir: Minor refactor of store_output
  i965/fs/nir: Don't stomp 64-bit values to D in get_nir_src
  intel/fs: Protect opt_algebraic from OOB BROADCAST indices
  intel/fs: Uniformize the index in readInvocation
  intel/fs: Retype dest to match value in read[First]Invocation
  intel/fs: Assign constant locations if they haven't been assigned
  intel/fs: Remove min_dispatch_width from fs_visitor
  intel/cs: Drop max_dispatch_width checks from compile_cs
  intel/cs: Stop setting dispatch_grf_start_reg
  intel/cs: Ignore runtime_check_aads_emit for CS
  intel/fs: Mark 64-bit values as being contiguous
  intel/cs: Rework the way thread local ID is handled
  intel/cs: Re-run final NIR optimizations for each SIMD size
  intel/cs: Re-run final NIR optimizations for each SIMD size
  intel/cs: Push subgroup ID instead of base thread ID
  intel/compiler/fs: Set up subgroup invocation as a system value
  intel/fs: Rework zero-length URB write handling
  intel/eu: Make automatic exec sizes a configurable option
  intel/eu: Explicitly set EXECUTE_1 where needed
  intel/fs: Explicitly set EXECUTE_1 where needed
  intel/fs: Don't use automatic exec size inference
  nir: Add a new subgroups lowering pass
  nir: Add a ssa_dest_init_for_type helper
  nir: Make ballot intrinsics variable-size
  nir/lower_system_values: Lower SUBGROUP_*_MASK based on type
  nir/lower_subgroups: Lower ballot intrinsics to the specified bit size
  nir,intel/compiler: Use a fixed subgroup size
  spirv: Add a vtn_constant_value helper
  spirv: Rework barriers
  nir: Validate base types on array dereferences
  compiler/nir_types: Handle vectors in glsl_get_array_element

 src/compiler/Makefile.sources                      |   2 +-
 src/compiler/glsl/glsl_to_nir.cpp                  |   1 +
 src/compiler/nir/nir.h                             |  25 +-
 src/compiler/nir/nir_intrinsics.h                  |  13 +-
 .../nir/nir_lower_read_invocation_to_scalar.c      | 112 ---------
 src/compiler/nir/nir_lower_subgroups.c             | 257 ++++++++++++++++++++
 src/compiler/nir/nir_lower_system_values.c         |   4 +-
 src/compiler/nir/nir_opt_intrinsics.c              |  69 +-----
 src/compiler/nir/nir_validate.c                    |  18 +-
 src/compiler/nir_types.cpp                         |   2 +
 src/compiler/spirv/spirv_to_nir.c                  | 132 ++++++++--
 src/compiler/spirv/vtn_private.h                   |   6 +
 src/intel/compiler/brw_compiler.c                  |   4 -
 src/intel/compiler/brw_compiler.h                  |   3 +-
 src/intel/compiler/brw_eu.c                        |   1 +
 src/intel/compiler/brw_eu.h                        |  10 +
 src/intel/compiler/brw_eu_emit.c                   |  90 +++++--
 src/intel/compiler/brw_fs.cpp                      | 268 ++++++++++++---------
 src/intel/compiler/brw_fs.h                        |  15 +-
 src/intel/compiler/brw_fs_generator.cpp            |  90 ++++---
 src/intel/compiler/brw_fs_live_variables.cpp       |  89 ++++++-
 src/intel/compiler/brw_fs_live_variables.h         |  12 +
 src/intel/compiler/brw_fs_nir.cpp                  | 262 ++++++++++++--------
 src/intel/compiler/brw_fs_visitor.cpp              |  78 +++---
 src/intel/compiler/brw_nir.c                       |  11 +-
 src/intel/compiler/brw_nir.h                       |   2 +-
 src/intel/compiler/brw_nir_lower_cs_intrinsics.c   |  56 ++---
 src/intel/compiler/brw_reg.h                       |  16 ++
 src/intel/compiler/brw_shader.cpp                  |   2 +
 src/intel/vulkan/anv_cmd_buffer.c                  |   6 +-
 src/mesa/drivers/dri/i965/gen6_constant_state.c    |   6 +-
 31 files changed, 1076 insertions(+), 586 deletions(-)
 delete mode 100644 src/compiler/nir/nir_lower_read_invocation_to_scalar.c
 create mode 100644 src/compiler/nir/nir_lower_subgroups.c


