Mesa (main): pan/va: Workaround quirk of barrier handling

GitLab Mirror gitlab-mirror at kemper.freedesktop.org
Tue Jun 21 22:41:34 UTC 2022


Module: Mesa
Branch: main
Commit: 8c6b9b9c9218cd115c3f3b9a1c6e56e6daa84fb4
URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=8c6b9b9c9218cd115c3f3b9a1c6e56e6daa84fb4

Author: Alyssa Rosenzweig <alyssa at collabora.com>
Date:   Mon Jun 20 16:52:19 2022 -0400

pan/va: Workaround quirk of barrier handling

For some unknown reason, waiting for general slots (at least for memory stores)
doesn't work properly on a BARRIER instruction. We need to wait for all general
slots right before issuing the BARRIER in addition to the general wait on the
BARRIER itself. I don't know if this is a hardware bug or some hideous
gate-saving quirk, but I observe the Mali-G78 DDK using the same workaround,
which implies this really is necessary.

Fixes rare flakes in:

   dEQP-GLES31.functional.compute.shared_var.work_group_size.float_128_1_1

Note that the flakes from that test are extremely timing dependent. Without this
change, that test is racy but we almost always win the race. Reproducing the
issue reliably requires high system load (e.g. running the CTS in the
background) and simultaneously running that test a large number of times.

Minimal shader-db impact. In particular, no cycle count regressions.

total instructions in shared programs: 2699419 -> 2699458 (<.01%)
instructions in affected programs: 22014 -> 22053 (0.18%)
helped: 2
HURT: 25
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
HURT stats (abs)   min: 1.0 max: 3.0 x̄: 1.64 x̃: 1
HURT stats (rel)   min: 0.07% max: 2.82% x̄: 0.69% x̃: 0.49%
95% mean confidence interval for instructions value: 1.01 1.87
95% mean confidence interval for instructions %-change: 0.38% 0.88%
Instructions are HURT.

total cvt in shared programs: 14468.81 -> 14469.42 (<.01%)
cvt in affected programs: 221.33 -> 221.94 (0.28%)
helped: 2
HURT: 25
helped stats (abs) min: 0.015625 max: 0.015625 x̄: 0.02 x̃: 0
helped stats (rel) min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18%
HURT stats (abs)   min: 0.015625 max: 0.046875 x̄: 0.03 x̃: 0
HURT stats (rel)   min: 0.10% max: 4.44% x̄: 1.06% x̃: 0.79%
95% mean confidence interval for cvt value: 0.02 0.03
95% mean confidence interval for cvt %-change: 0.57% 1.36%
Cvt are HURT.

total quadwords in shared programs: 1462496 -> 1462528 (<.01%)
quadwords in affected programs: 4632 -> 4664 (0.69%)
helped: 0
HURT: 4
HURT stats (abs)   min: 8.0 max: 8.0 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 0.35% max: 7.69% x̄: 4.03% x̃: 4.03%
95% mean confidence interval for quadwords value: 8.00 8.00
95% mean confidence interval for quadwords %-change: -2.71% 10.76%
Inconclusive result (%-change mean confidence interval includes 0).

Signed-off-by: Alyssa Rosenzweig <alyssa at collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>

---

 src/panfrost/bifrost/valhall/va_insert_flow.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/src/panfrost/bifrost/valhall/va_insert_flow.c b/src/panfrost/bifrost/valhall/va_insert_flow.c
index 34bcaa9dc44..2881ac34193 100644
--- a/src/panfrost/bifrost/valhall/va_insert_flow.c
+++ b/src/panfrost/bifrost/valhall/va_insert_flow.c
@@ -199,6 +199,28 @@ bi_set_dependencies(bi_block *block, bi_instr *I, struct bi_scoreboard_state *st
       u_foreach_bit(slot, st->memory)
          I->flow |= bi_pop_slot(st, slot);
    }
+
+   /* We need to wait for all general slots before a barrier. The reason is
+    * unknown. In theory, this is redundant, since the BARRIER instruction will
+    * be followed immediately by .wait which waits for all slots. However, that
+    * doesn't seem to work properly in practice.
+    *
+    * The DDK is observed to use the same workaround, going so far as
+    * introducing a NOP before a BARRIER at the beginning of a basic block when
+    * there are outstanding stores.
+    *
+    *     NOP.wait12
+    *     BARRIER.slot7.wait
+    *
+    * Luckily, this situation is pretty rare. The wait introduced here can
+    * usually be merged into the preceding instruction.
+    */
+   if (I->op == BI_OPCODE_BARRIER) {
+      for (unsigned i = 0; i < VA_NUM_GENERAL_SLOTS; ++i) {
+         if (st->write[i] || (st->memory & BITFIELD_BIT(i)))
+            I->flow |= bi_pop_slot(st, i);
+      }
+   }
 }
 
 static bool



More information about the mesa-commit mailing list