[Mesa-dev] [PATCH 00/53] intel/fs: SIMD32 support for fragment shaders

Jason Ekstrand jason at jlekstrand.net
Thu May 24 21:55:42 UTC 2018


This patch series adds back-end compiler support for SIMD32 fragment
shaders.  Support is added and everything works but it's currently hidden
behind INTEL_DEBUG=do32.  We know that it improves performance in some
cases but we do not yet have a good enough heuristic to start turning it on
by default.  The objective of this series is to just to get the compiler
infrastructure landed so that it stops bit-rotting in Curro's branch.
Figuring out a good heuristic is left as an exercise to the reader. :-)

Francisco Jerez (34):
  intel/eu: Remove brw_codegen::compressed_stack.
  intel/fs: Rename a local variable so it doesn't shadow component()
  intel/fs: Use the ATTR file for FS inputs
  intel/fs: Replace the CINTERP opcode with a simple MOV
  intel/fs: Add explicit last_rt flag to fb writes orthogonal to eot.
  intel/fs: Fix Gen4-5 FB write AA data payload munging for non-EOT
    writes.
  intel/eu: Return new instruction to caller from brw_fb_WRITE().
  intel/fs: Fix fs_inst::flags_written() for Gen4-5 FB writes.
  intel/fs: Fix implied_mrf_writes() for headerless FB writes.
  intel/fs: Remove program key argument from generator.
  intel/fs: Disable SIMD32 dispatch on Gen4-6 with control flow
  intel/fs: Disable SIMD32 dispatch for fragment shaders with discard.
  intel/eu: Fix pixel interpolator queries for SIMD32.
  intel/fs: Fix codegen of FS_OPCODE_SET_SAMPLE_ID for SIMD32.
  intel/fs: Don't enable dual source blend if no outputs are written
  intel/fs: Fix FB write message control codegen for SIMD32.
  intel/fs: Fix logical FB write lowering for SIMD32
  intel/fs: Fix FB read header setup for SIMD32.
  intel/fs: Rework INTERPOLATE_AT_PER_SLOT_OFFSET
  intel/fs: Mark LINTERP opcode as writing accumulator implicitly on
    pre-Gen7.
  intel/fs: Disable opt_sampler_eot() in 32-wide dispatch.
  i965: Add plumbing for shader time in 32-wide FS dispatch mode.
  intel/fs: Simplify fs_visitor::emit_samplepos_setup
  intel/fs: Use fs_regs instead of brw_regs in the unlit centroid
    workaround
  intel/fs: Wrap FS payload register look-up in a helper function.
  intel/fs: Extend thread payload layout to SIMD32
  intel/fs: Implement 32-wide FS payload setup on Gen6+
  intel/fs: Fix Gen7 compressed source region alignment restriction for
    SIMD32
  intel/fs: Fix sample id setup for SIMD32.
  intel/fs: Generalize the unlit centroid workaround
  intel/fs: Fix Gen6+ interpolation setup for SIMD32
  intel/fs: Fix fs_builder::sample_mask_reg() for 32-wide FS dispatch.
  intel/fs: Fix nir_intrinsic_load_helper_invocation for SIMD32.
  intel/fs: Build 32-wide FS shaders.

Jason Ekstrand (19):
  intel/fs: Assert that the gen4-6 plane restrictions are followed
  intel/fs: Use groups for SIMD16 LINTERP on gen11+
  intel/fs: FS_OPCODE_REP_FB_WRITE has side effects
  intel/fs: Properly track implied header regs read by FB writes
  intel/fs: Pull FB write implied headers from src[0]
  intel/fs: Set up FB write message headers in the visitor
  i965: Re-arrange shader kernel setup in WM state
  intel/compiler: Add and use helpers for working with KSP indices
  intel/fs: Rework KSP data to be SIMD width-based
  intel/fs: Split instructions low to high in lower_simd_width
  intel/fs: Properly copy default flag reg for 3src instrucitons
  intel/fs: Add the group to the flag subreg number on SNB and older
  intel/fs: Emit LINE+MAC for LINTERP with unaligned coordinates
  intel/fs: Emit MOV_DISPATCH_TO_FLAGS once for the centroid workaround
  intel/fs: Get rid of MOV_DISPATCH_TO_FLAGS
  intel/fs: Add fields to wm_prog_data for SIMD32 dispatch
  intel/anv,blorp,i965: Implement the SKL 16x MSAA SIMD32 workaround
  intel/fs: Remove support push constants in repclear shaders
  intel/fs: Support SIMD32 repclear shaders

 src/intel/blorp/blorp.c                       |   2 +-
 src/intel/blorp/blorp_genX_exec.h             |  82 +++-
 src/intel/compiler/brw_compiler.h             |  98 +++-
 src/intel/compiler/brw_eu.h                   |  21 +-
 src/intel/compiler/brw_eu_defines.h           |   2 -
 src/intel/compiler/brw_eu_emit.c              |  39 +-
 src/intel/compiler/brw_fs.cpp                 | 666 ++++++++++++++++----------
 src/intel/compiler/brw_fs.h                   |  53 +-
 src/intel/compiler/brw_fs_builder.h           |   6 +-
 src/intel/compiler/brw_fs_cse.cpp             |   1 -
 src/intel/compiler/brw_fs_generator.cpp       | 318 ++++++------
 src/intel/compiler/brw_fs_nir.cpp             |  57 ++-
 src/intel/compiler/brw_fs_visitor.cpp         | 193 ++++----
 src/intel/compiler/brw_ir_fs.h                |   1 +
 src/intel/compiler/brw_shader.cpp             |  12 +-
 src/intel/compiler/brw_vec4.cpp               |   2 +-
 src/intel/compiler/brw_vec4_gs_visitor.cpp    |   2 +-
 src/intel/compiler/brw_vec4_tcs.cpp           |   2 +-
 src/intel/compiler/brw_wm_iz.cpp              |  11 +-
 src/intel/vulkan/anv_pipeline.c               |   2 +-
 src/intel/vulkan/genX_pipeline.c              |  40 +-
 src/mesa/drivers/dri/i965/brw_context.h       |   1 +
 src/mesa/drivers/dri/i965/brw_program.c       |   6 +
 src/mesa/drivers/dri/i965/brw_wm.c            |   6 +-
 src/mesa/drivers/dri/i965/gen4_blorp_exec.h   |  17 +-
 src/mesa/drivers/dri/i965/genX_state_upload.c | 144 ++++--
 26 files changed, 1101 insertions(+), 683 deletions(-)

-- 
2.5.0.400.gff86faf



More information about the mesa-dev mailing list