[Mesa-dev] [RFC 0/7] i965: SIMD32 selection heuristics

Toni Lönnberg toni.lonnberg at intel.com
Mon Oct 15 13:19:51 UTC 2018

Since we have SIMD32 support available for fragment shaders, it would be nice
to actually enable them. The changes proposed here are not meant as the final
solution to SIMD32 selection by any means, they're meant to be a way to
enable SIMD32 in case a customer absolutely needs them to be enabled for
performance before we actually have a proper heuristic in place. The heuristic
is mainly trying to limit regressions.

These heuristics look at a couple of things to make a choice regarding SIMD32

1) Number of enabled MRTs
2) Number of grouped texture fetches
3) Instruction count ratio between SIMD16 and SIMD32

Reasons being, multiple writes tends to trash the render cache, multiple
grouped texture fetches tend to trash the sampler and L3 caches, and with 
these things being equal SIMD32 usually tends to still perform better or 
equally well, as long as it can compensate for latency, even if it has a bit 
more instructions than its SIMD16 counterpart.

A proper heuristic would be looking at whether the shader *actually* can
compensate for latency in any way, which requires some integration to the
scheduler. But as of at this moment, the scheduler reports kind of weird
numbers for the cycle counts. To alleviate problems regarding SIMD32, the
scheduler should also try to schedule texture fetches in smaller groups
in general.

The default values have been tweaked in a way that we most of the time get
benefits and not a lot of regressions from enabling SIMD32.

In my runs, mostly with BXT, the biggest boosts and regressions are as

+38.5% in GLBench5 ALU2
-7.1% in GLBenchmark fill test

Depending on the platform, the results may differ, SKL both regresses and 
gains less, BSW regresses more and gains less than BXT.

As this is an experimental patch, it is not on by default but has to be
enabled via INTEL_DEBUG, just like forcing SIMD32 on. Further more, the
different mechanisms of the heuristic can be controlled via environment

Toni Lönnberg (7):
  i965: SIMD32 heuristics debug flag
  i965: SIMD32 heuristics control data
  i965: SIMD32 heuristics control data from drirc
  mesa: Helper functions for counting set bits in a mask
  i965/fs: Save the instruction count of each dispatch width
  i965/fs: SIMD32 selection heuristic based on grouped texture fetches
  i965/fs: Enable all SIMD32 heuristics

 src/intel/common/gen_debug.c             |  1 +
 src/intel/common/gen_debug.h             |  3 +-
 src/intel/compiler/brw_compiler.h        | 11 ++++++
 src/intel/compiler/brw_fs.cpp            | 63 +++++++++++++++++++++++++++++---
 src/intel/compiler/brw_fs.h              |  4 ++
 src/intel/compiler/brw_fs_generator.cpp  | 12 ++++++
 src/mesa/drivers/dri/i965/brw_context.c  | 13 +++++++
 src/mesa/drivers/dri/i965/intel_screen.c | 27 ++++++++++++++
 src/util/bitscan.h                       | 25 +++++++++++++
 9 files changed, 152 insertions(+), 7 deletions(-)


More information about the mesa-dev mailing list