Mesa (master): intel/nir: Lower array-deref-of-vector UBO and SSBO loads

Sat Mar 16 14:23:24 UTC 2019

Module: Mesa
Branch: master
Commit: d3386e73c5976ecec84821d17f05c2fd4b823880
URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=d3386e73c5976ecec84821d17f05c2fd4b823880

Author: Jason Ekstrand <jason.ekstrand at intel.com>
Date:   Thu Mar 14 12:58:16 2019 -0500

intel/nir: Lower array-deref-of-vector UBO and SSBO loads

This fixes a serious performance issue with DXVK:

https://github.com/doitsujin/dxvk/issues/937

This was caused by a recent change that to improve performance on RADV
which back-fired on ANV and killed performance for some apps:

https://github.com/doitsujin/dxvk/commit/e5a06d3f4a103a54cd4eb51970fedee405d1d698

Throwing in this bit of lowering lets us come along and CSE those UBO
loads (or copy-prop for SSBO load) and get one load where we previously
would have gotten several.

VkPipeline-db results on Kaby Lake:

    total instructions in shared programs: 5115361 -> 5073185 (-0.82%)
    instructions in affected programs: 1754333 -> 1712157 (-2.40%)
    helped: 5331
    HURT: 63

    total cycles in shared programs: 2544501169 -> 2481144545 (-2.49%)
    cycles in affected programs: 2531058653 -> 2467702029 (-2.50%)
    helped: 9202
    HURT: 4323

    total loops in shared programs: 3340 -> 3331 (-0.27%)
    loops in affected programs: 9 -> 0
    helped: 9
    HURT: 0

    total spills in shared programs: 3246 -> 3053 (-5.95%)
    spills in affected programs: 384 -> 191 (-50.26%)
    helped: 10
    HURT: 5

    total fills in shared programs: 4626 -> 4452 (-3.76%)
    fills in affected programs: 439 -> 265 (-39.64%)
    helped: 10
    HURT: 5

All of the shaders with hurt spilling were in Rise of the Tomb Raider
which also had shaders solidly helped in the spilling department.  Not
shown in those results (because I've not had success dumping the
shaders) is Witcher 3 where this reduces spilling and improves over-all
perf by around 20-25%.  There were no shader-db changes.  Apparently,
this just isn't a pattern that happens in OpenGL.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira at intel.com>
Cc: "19.0" mesa-stable at lists.freedesktop.org

---

 src/intel/compiler/brw_nir.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index 5734987a964..7719ad40251 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -742,6 +742,17 @@ brw_preprocess_nir(const struct brw_compiler *compiler, nir_shader *nir,
       brw_nir_no_indirect_mask(compiler, nir->info.stage);
    OPT(nir_lower_indirect_derefs, indirect_mask);
 
+   /* Lower array derefs of vectors for SSBO and UBO loads.  For both UBOs and
+    * SSBOs, our back-end is capable of loading an entire vec4 at a time and
+    * we would like to take advantage of that whenever possible regardless of
+    * whether or not the app gives us full loads.  This should allow the
+    * optimizer to combine UBO and SSBO load operations and save us some send
+    * messages.
+    */
+   OPT(nir_lower_array_deref_of_vec,
+       nir_var_mem_ubo | nir_var_mem_ssbo,
+       nir_lower_direct_array_deref_of_vec_load);
+
    /* Get rid of split copies */
    nir = brw_nir_optimize(nir, compiler, is_scalar, false);