[Mesa-dev] [PATCH 00/11] i965: Rework CS local IDs for gen7+

Jordan Justen jordan.l.justen at intel.com
Tue May 24 08:37:45 UTC 2016


git://people.freedesktop.org/~jljusten/mesa hsw-cs-cross-thread-constants-v1

Note: These patches break the anv (vulkan) build. In the branch above
branch I have a hack patch which will build anv, but anv will be
broken. (Ie, I need to add another 'squash' patch for anv to this
series.)

These patches redo how we handle compute shader local IDs. Rather than
uploading a uvec3 for each channel, we now upload a single uint for
the each thread to give a base thread ID. We then add an offset to it
for each channel in the thread which gives us gl_LocalInvocationIndex.
>From that variable we can calculate the gl_LocalInvocationID.

For SIMD16, this means that we push 4 bytes as push constant data
rather than 4 x 3 (uvec3) x 16 (simd16) = 192 bytes. This data is
replicated per thread execution, which meant up to 64 times this size
might be used. (So, potentially this could drop from 12288 bytes
downto 256 bytes.)

In addition to this, we now also take advantage of the Haswell feature
which allows for some registers to be loaded from a common set of
(cross-thread) data.

The amount of memory saved by this depends on how many uniforms the
program used. As an example, previously if one register (32 bytes) was
filled, and the CS program had a local size of 1024, then in SIMD16
mode, this would require 64 threads. Previously the 32 bytes had to be
replicated for all 64 threads, and therefore would be 2048 bytes. For
Haswell+ this can now use just 32 bytes.

We have also merged the uniform and thread ID data together, which can
potentially save a small amount of memory, and perhaps more
significantly a register.

These changes allow the UE4 Elemental demo to run in OpenGL 4.3 mode
when the Mesa version is overridden. I tested Haswell and Broadwell.

Jordan Justen (11):
  i965/compute: Fix uniform init issue when SIMD8 is skipped
  glsl: Add glsl LowerCsDerivedVariables option
  i965: Use nir to lower cs-derived variables
  i965: Add nir based intrinsic lowering
  nir: Make lowering gl_LocalInvocationIndex optional
  i965: Add nir channel_num system value
  i965: Add uniform to hold the CS thread ID base
  squash-fwd i965/cs: Add CS push constant structure
  squash i965: Use struct push_const_info and support cross-thread
    constants
  squash i965: Run the intrinsics lowering pass
  squash i965: Remove old CS local ID handling

 src/compiler/glsl/builtin_variables.cpp        |  13 +-
 src/compiler/glsl/glsl_parser_extras.cpp       |   8 +-
 src/compiler/nir/nir.c                         |   4 +
 src/compiler/nir/nir.h                         |   2 +
 src/compiler/nir/nir_gather_info.c             |   1 +
 src/compiler/nir/nir_intrinsics.h              |   2 +
 src/compiler/nir/nir_lower_system_values.c     |  16 ++-
 src/mesa/drivers/dri/i965/Makefile.sources     |   1 +
 src/mesa/drivers/dri/i965/brw_compiler.c       |   3 +-
 src/mesa/drivers/dri/i965/brw_compiler.h       |   9 +-
 src/mesa/drivers/dri/i965/brw_cs.c             |  16 ++-
 src/mesa/drivers/dri/i965/brw_defines.h        |   3 +
 src/mesa/drivers/dri/i965/brw_fs.cpp           |  99 ++------------
 src/mesa/drivers/dri/i965/brw_fs.h             |   1 -
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp       |  22 ++--
 src/mesa/drivers/dri/i965/brw_nir.c            |  18 +++
 src/mesa/drivers/dri/i965/brw_nir.h            |   1 +
 src/mesa/drivers/dri/i965/brw_nir_intrinsics.c | 142 ++++++++++++++++++++
 src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp |   2 +-
 src/mesa/drivers/dri/i965/gen7_cs_state.c      | 173 ++++++++++++++++++-------
 src/mesa/main/mtypes.h                         |   4 +
 src/mesa/state_tracker/st_extensions.c         |   4 +-
 22 files changed, 376 insertions(+), 168 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_nir_intrinsics.c

-- 
2.8.1



More information about the mesa-dev mailing list