[Mesa-dev] [PATCH 0/9] i965/fs: Uniform pull constant loads through the constant cache.

Francisco Jerez currojerez at riseup.net
Fri Dec 9 19:03:23 UTC 2016


This is a respin of a series I sent nearly two years ago
reimplementing uniform pull constant loads in terms of constant cache
block read messages instead of using sampler LD messages.  The
motivation is that oword block read messages are able to fetch more
data with a single message than the current SIMD4x2 sampler LD
messages, and they don't contribute to thrashing of the sampler
caches, which can lead to performance problems with several workloads.
Here is a summary of the benchmarks that are improved by this series
along with an estimate of their standard deviation (see PATCH 6 for
more details):

                           | SKL           | BDW          | HSW
  SynMark2 OglShMapPcf     | 24.63% ±0.45% | 4.01% ±0.70% | 10.31% ±0.38%
  GfxBench4 gl_manhattan31 |  5.93% ±0.35% | 3.92% ±0.31% |  6.62% ±0.22%
  GfxBench4 gl_4           |  2.52% ±0.44% | 1.23% ±0.10% |      N/A
  Unigine Valley           |  0.83% ±0.17% | 0.23% ±0.05% |  0.74% ±0.45%

I'm resending the series since Mark pointed out that the i965 driver
leads to an increased amount of sampler traffic in comparison to the
proprietary driver during some expensive draw calls of the Manhattan
demo.  On the other hand it would lead to a decreased (in fact zero)
non-sampler shader memory access counts.  The original Manhattan demo
I tried two years ago wasn't affected by the change, because it didn't
make use of UBOs at all, but the newer gl_manhattan31 demo based on GL
4.3/GLES 3.1 does as you can tell from the table above.

The series should be roughly functionally equivalent to the last
revision, but rebased two years forwards in time, which involved
nearly rewriting some of the patches so I ended up making things
slightly more flexible to allow the oword read block size to be
specified arbitrarily by the back-end in order to allow easier future
extension to use a larger block size -- Or a smaller one in order to
minimize register pressure.

 src/mesa/drivers/dri/i965/brw_defines.h          |   7 ++++++-
 src/mesa/drivers/dri/i965/brw_disasm.c           |   1 +
 src/mesa/drivers/dri/i965/brw_eu.h               |   1 +
 src/mesa/drivers/dri/i965/brw_eu_emit.c          |  97 +++++++++++++++++++++++++++++++++++++++----------------------------------------------------------
 src/mesa/drivers/dri/i965/brw_fs.cpp             |  63 +++++++++++++++++----------------------------------------------
 src/mesa/drivers/dri/i965/brw_fs.h               |   5 +----
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 108 ++++++++++++++++++++++++------------------------------------------------------------------------------------
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp         |  19 +++++++++++--------
 src/mesa/drivers/dri/i965/brw_pipe_control.c     |   1 +
 src/mesa/drivers/dri/i965/brw_shader.cpp         |   2 --
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  15 ++++++++++++---
 11 files changed, 113 insertions(+), 206 deletions(-)

[PATCH 1/9] i965/gen6+: Invalidate constant cache on brw_emit_mi_flush().
[PATCH 2/9] i965: Let the caller of brw_set_dp_write/read_message control the target cache.
[PATCH 3/9] i965/fs: Switch to the constant cache for uniform pull constants.
[PATCH 4/9] i965: Factor out oword block read and write message control calculation.
[PATCH 5/9] i965/fs: Expose arbitrary pull constant load sizes to the IR.
[PATCH 6/9] i965/fs: Fetch one cacheline of pull constants at a time.
[PATCH 7/9] i965/fs: Drop useless access mode override from pull constant generator code.
[PATCH 8/9] i965/fs: Remove the FS_OPCODE_SET_SIMD4X2_OFFSET virtual opcode.
[PATCH 9/9] i965/disasm: Decode dataport constant cache control fields.


More information about the mesa-dev mailing list