Mesa (main): pan/bi: Support message preloading

GitLab Mirror gitlab-mirror at kemper.freedesktop.org
Thu Feb 24 20:12:34 UTC 2022


Module: Mesa
Branch: main
Commit: eb1479bda22bf80b553a87ab781956dc068d5b19
URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=eb1479bda22bf80b553a87ab781956dc068d5b19

Author: Alyssa Rosenzweig <alyssa at collabora.com>
Date:   Wed Feb 23 13:50:54 2022 -0500

pan/bi: Support message preloading

Preload LD_VAR_IMM or VAR_TEX instructions in the first block of fragment
shaders on v7. Preloaded messages write to fixed registers; when replacing
instructions we insert moves from the registers at the start of the program and
hope coalescing goes to town. (Admittedly we don't do any coalescing yet...)
The extra moves hurts instruction count in some cases; the win for cycle count
should cancel this out. When we get smarter copy prop or RA, those moves should
go away anyway.

This optimization may hurt register pressure by extending the lifetime of up to
eight registers written in the first block. This is expected to be acceptable:
on a large shader-db, there are no additional spills/fills, and only two shaders
are hurt on thread count.

This optimization only applies to v7, as the hardware was not introduced on v6
and was removed for Valhall.

total instructions in shared programs: 2451624 -> 2454286 (0.11%)
instructions in affected programs: 909046 -> 911708 (0.29%)
helped: 4719
HURT: 3341
helped stats (abs) min: 1.0 max: 10.0 x̄: 1.49 x̃: 1
helped stats (rel) min: 0.08% max: 33.33% x̄: 6.79% x̃: 3.92%
HURT stats (abs)   min: 1.0 max: 50.0 x̄: 2.90 x̃: 2
HURT stats (rel)   min: 0.12% max: 66.67% x̄: 6.39% x̃: 3.45%
95% mean confidence interval for instructions value: 0.27 0.39
95% mean confidence interval for instructions %-change: -1.55% -1.11%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

total tuples in shared programs: 1969529 -> 1963429 (-0.31%)
tuples in affected programs: 601327 -> 595227 (-1.01%)
helped: 5907
HURT: 1297
helped stats (abs) min: 1.0 max: 8.0 x̄: 1.41 x̃: 1
helped stats (rel) min: 0.07% max: 33.33% x̄: 7.25% x̃: 5.26%
HURT stats (abs)   min: 1.0 max: 40.0 x̄: 1.73 x̃: 1
HURT stats (rel)   min: 0.16% max: 31.75% x̄: 3.38% x̃: 2.02%
95% mean confidence interval for tuples value: -0.88 -0.81
95% mean confidence interval for tuples %-change: -5.52% -5.15%
Tuples are helped.

total clauses in shared programs: 401689 -> 387830 (-3.45%)
clauses in affected programs: 136944 -> 123085 (-10.12%)
helped: 8427
HURT: 4
helped stats (abs) min: 1.0 max: 4.0 x̄: 1.65 x̃: 2
helped stats (rel) min: 0.49% max: 50.00% x̄: 19.88% x̃: 18.18%
HURT stats (abs)   min: 1.0 max: 4.0 x̄: 2.50 x̃: 2
HURT stats (rel)   min: 1.96% max: 19.05% x̄: 14.18% x̃: 17.86%
95% mean confidence interval for clauses value: -1.66 -1.63
95% mean confidence interval for clauses %-change: -20.15% -19.58%
Clauses are helped.

total cycles in shared programs: 202735.83 -> 201862.21 (-0.43%)
cycles in affected programs: 16295.46 -> 15421.83 (-5.36%)
helped: 3349
HURT: 1962
helped stats (abs) min: 0.041665999999999315 max: 1.0 x̄: 0.32 x̃: 0
helped stats (rel) min: 0.24% max: 100.00% x̄: 40.77% x̃: 33.33%
HURT stats (abs)   min: 0.041665999999999315 max: 1.5833329999999997 x̄: 0.10 x̃: 0
HURT stats (rel)   min: 0.09% max: 31.40% x̄: 2.95% x̃: 1.94%
95% mean confidence interval for cycles value: -0.17 -0.16
95% mean confidence interval for cycles %-change: -25.48% -23.76%
Cycles are helped.

total arith in shared programs: 74665.50 -> 74920.00 (0.34%)
arith in affected programs: 16059.92 -> 16314.42 (1.58%)
helped: 860
HURT: 3409
helped stats (abs) min: 0.041665999999999315 max: 0.25 x̄: 0.06 x̃: 0
helped stats (rel) min: 0.24% max: 37.50% x̄: 4.73% x̃: 2.56%
HURT stats (abs)   min: 0.041665999999999315 max: 1.5833329999999997 x̄: 0.09 x̃: 0
HURT stats (rel)   min: 0.09% max: 100.00% x̄: 8.99% x̃: 4.21%
95% mean confidence interval for arith value: 0.06 0.06
95% mean confidence interval for arith %-change: 5.83% 6.62%
Arith are HURT.

total texture in shared programs: 13083.50 -> 11877 (-9.22%)
texture in affected programs: 1663 -> 456.50 (-72.55%)
helped: 2377
HURT: 3
helped stats (abs) min: 0.5 max: 1.0 x̄: 0.51 x̃: 0
helped stats (rel) min: 6.25% max: 100.00% x̄: 87.12% x̃: 100.00%
HURT stats (abs)   min: 0.5 max: 0.5 x̄: 0.50 x̃: 0
HURT stats (rel)   min: 0.00% max: 25.00% x̄: 16.67% x̃: 25.00%
95% mean confidence interval for texture value: -0.51 -0.50
95% mean confidence interval for texture %-change: -87.98% -86.00%
Texture are helped.

total vary in shared programs: 10220.62 -> 4183.88 (-59.06%)
vary in affected programs: 10126.50 -> 4089.75 (-59.61%)
helped: 8538
HURT: 0
helped stats (abs) min: 0.125 max: 1.0 x̄: 0.71 x̃: 0
helped stats (rel) min: 7.14% max: 100.00% x̄: 74.74% x̃: 87.50%
95% mean confidence interval for vary value: -0.71 -0.70
95% mean confidence interval for vary %-change: -75.32% -74.16%
Vary are helped.

total quadwords in shared programs: 1766717 -> 1757161 (-0.54%)
quadwords in affected programs: 553801 -> 544245 (-1.73%)
helped: 6760
HURT: 711
helped stats (abs) min: 1.0 max: 11.0 x̄: 1.58 x̃: 1
helped stats (rel) min: 0.09% max: 29.41% x̄: 5.31% x̃: 4.84%
HURT stats (abs)   min: 1.0 max: 33.0 x̄: 1.54 x̃: 1
HURT stats (rel)   min: 0.10% max: 31.13% x̄: 2.53% x̃: 1.61%
95% mean confidence interval for quadwords value: -1.31 -1.25
95% mean confidence interval for quadwords %-change: -4.67% -4.46%
Quadwords are helped.

total threads in shared programs: 52899 -> 52897 (<.01%)
threads in affected programs: 4 -> 2 (-50.00%)
helped: 0
HURT: 2

total preloads in shared programs: 0 -> 116492
preloads in affected programs: 0 -> 116492
helped: 0
HURT: 8604
HURT stats (abs)   min: 2.0 max: 24.0 x̄: 13.54 x̃: 14
HURT stats (rel)   min: 0.00% max: 0.00% x̄: 0.00% x̃: 0.00%
95% mean confidence interval for preloads value: 13.45 13.63
95% mean confidence interval for preloads %-change: 0.00% 0.00%
Preloads are HURT.

Signed-off-by: Alyssa Rosenzweig <alyssa at collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9438>

---

 src/panfrost/bifrost/bi_opt_message_preload.c | 141 ++++++++++++++++++++++++++
 src/panfrost/bifrost/bifrost.h                |   1 +
 src/panfrost/bifrost/bifrost_compile.c        |  11 ++
 src/panfrost/bifrost/compiler.h               |   1 +
 src/panfrost/bifrost/meson.build              |   1 +
 5 files changed, 155 insertions(+)

diff --git a/src/panfrost/bifrost/bi_opt_message_preload.c b/src/panfrost/bifrost/bi_opt_message_preload.c
new file mode 100644
index 00000000000..e19eeb4b0ea
--- /dev/null
+++ b/src/panfrost/bifrost/bi_opt_message_preload.c
@@ -0,0 +1,141 @@
+/*
+ * Copyright (C) 2021 Collabora, Ltd.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "compiler.h"
+#include "bi_builder.h"
+
+/* Bifrost v7 can preload up to two messages of the form:
+ *
+ * 1. +LD_VAR_IMM, register_format f32/f16, sample mode
+ * 2. +VAR_TEX, register format f32/f16, sample mode (TODO)
+ *
+ * Analyze the shader for these instructions and push accordingly.
+ */
+
+static bool
+bi_is_regfmt_float(enum bi_register_format regfmt)
+{
+        return (regfmt == BI_REGISTER_FORMAT_F32) ||
+                (regfmt == BI_REGISTER_FORMAT_F16);
+}
+
+/*
+ * Preloaded varyings are interpolated at the sample location. Check if an
+ * instruction can use this interpolation mode.
+ */
+static bool
+bi_can_interp_at_sample(bi_instr *I)
+{
+        /* .sample mode with r61 corresponds to per-sample interpolation */
+        if (I->sample == BI_SAMPLE_SAMPLE)
+                return bi_is_value_equiv(I->src[0], bi_register(61));
+
+        /* If the shader runs with pixel-frequency shading, .sample is
+         * equivalent to .center, so allow .center
+         *
+         * If the shader runs with sample-frequency shading, .sample and .center
+         * are not equivalent. However, the ESSL 3.20 specification
+         * stipulates in section 4.5 ("Interpolation Qualifiers"):
+         *
+         *    for fragment shader input variables qualified with neither
+         *    centroid nor sample, the value of the assigned variable may be
+         *    interpolated anywhere within the pixel and a single value may be
+         *    assigned to each sample within the pixel, to the extent permitted
+         *    by the OpenGL ES Specification.
+         *
+         * We only produce .center for variables qualified with neither centroid
+         * nor sample, so if .center is specified this section applies. This
+         * suggests that, although per-pixel interpolation is allowed, it is not
+         * mandated ("may" rather than "must" or "should"). Therefore it appears
+         * safe to substitute sample.
+         */
+        return (I->sample == BI_SAMPLE_CENTER);
+}
+
+static bool
+bi_can_preload_ld_var(bi_instr *I)
+{
+        return (I->op == BI_OPCODE_LD_VAR_IMM) &&
+                bi_can_interp_at_sample(I) &&
+                bi_is_regfmt_float(I->register_format);
+}
+
+static bool
+bi_is_var_tex(enum bi_opcode op)
+{
+        return (op == BI_OPCODE_VAR_TEX_F32) || (op == BI_OPCODE_VAR_TEX_F16);
+}
+
+void
+bi_opt_message_preload(bi_context *ctx)
+{
+        unsigned nr_preload = 0;
+
+        /* We only preload from the first block */
+        bi_block *block = bi_start_block(&ctx->blocks);
+        bi_builder b = bi_init_builder(ctx, bi_before_nonempty_block(block));
+
+        bi_foreach_instr_in_block_safe(block, I) {
+                if (!bi_is_ssa(I->dest[0])) continue;
+
+                struct bifrost_message_preload msg;
+
+                if (bi_can_preload_ld_var(I)) {
+                        msg = (struct bifrost_message_preload) {
+                                .enabled = true,
+                                .varying_index = I->varying_index,
+                                .fp16 = (I->register_format == BI_REGISTER_FORMAT_F16),
+                                .num_components = I->vecsize + 1
+                        };
+                } else if (bi_is_var_tex(I->op)) {
+                        msg = (struct bifrost_message_preload) {
+                                .enabled = true,
+                                .texture = true,
+                                .varying_index = I->varying_index,
+                                .sampler_index = I->sampler_index,
+                                .fp16 = (I->op == BI_OPCODE_VAR_TEX_F16),
+                                .skip = I->skip,
+                                .zero_lod = I->lod_mode
+                        };
+                } else {
+                        continue;
+                }
+
+                /* Report the preloading */
+                ctx->info.bifrost->messages[nr_preload] = msg;
+
+                /* Replace with moves at the start. Ideally, they will be
+                 * coalesced out or copy propagated.
+                 */
+                for (unsigned i = 0; i < bi_count_write_registers(I, 0); ++i) {
+                        bi_mov_i32_to(&b, bi_word(I->dest[0], i),
+                                          bi_register((nr_preload * 4) + i));
+                }
+
+                bi_remove_instruction(I);
+
+                /* Maximum number of preloaded messages */
+                if ((++nr_preload) == 2)
+                        break;
+        }
+}
diff --git a/src/panfrost/bifrost/bifrost.h b/src/panfrost/bifrost/bifrost.h
index c04b8a61ad4..6dce0c53b38 100644
--- a/src/panfrost/bifrost/bifrost.h
+++ b/src/panfrost/bifrost/bifrost.h
@@ -46,6 +46,7 @@ extern "C" {
 #define BIFROST_DBG_NOOPT       0x0100
 #define BIFROST_DBG_NOIDVS      0x0200
 #define BIFROST_DBG_NOSB        0x0400
+#define BIFROST_DBG_NOPRELOAD   0x0800
 
 extern int bifrost_debug;
 
diff --git a/src/panfrost/bifrost/bifrost_compile.c b/src/panfrost/bifrost/bifrost_compile.c
index ce51a5a40c6..0450e73cb34 100644
--- a/src/panfrost/bifrost/bifrost_compile.c
+++ b/src/panfrost/bifrost/bifrost_compile.c
@@ -48,6 +48,7 @@ static const struct debug_named_value bifrost_debug_options[] = {
         {"noopt",     BIFROST_DBG_NOOPT,        "Skip optimization passes"},
         {"noidvs",    BIFROST_DBG_NOIDVS,       "Disable IDVS"},
         {"nosb",      BIFROST_DBG_NOSB,         "Disable scoreboarding"},
+        {"nopreload", BIFROST_DBG_NOPRELOAD,    "Disable message preloading"},
         DEBUG_NAMED_VALUE_END
 };
 
@@ -4012,6 +4013,16 @@ bi_compile_variant_nir(nir_shader *nir,
                 bi_opt_copy_prop(ctx);
                 bi_opt_mod_prop_forward(ctx);
                 bi_opt_mod_prop_backward(ctx);
+
+                /* Push LD_VAR_IMM/VAR_TEX instructions. Must run after
+                 * mod_prop_backward to fuse VAR_TEX */
+                if (ctx->arch == 7 && ctx->stage == MESA_SHADER_FRAGMENT &&
+                    !(bifrost_debug & BIFROST_DBG_NOPRELOAD)) {
+                        bi_opt_dead_code_eliminate(ctx);
+                        bi_opt_message_preload(ctx);
+                        bi_opt_copy_prop(ctx);
+                }
+
                 bi_opt_dead_code_eliminate(ctx);
                 bi_opt_cse(ctx);
                 bi_opt_dead_code_eliminate(ctx);
diff --git a/src/panfrost/bifrost/compiler.h b/src/panfrost/bifrost/compiler.h
index 664e25a3c7e..f71478e0cfd 100644
--- a/src/panfrost/bifrost/compiler.h
+++ b/src/panfrost/bifrost/compiler.h
@@ -986,6 +986,7 @@ void bi_opt_mod_prop_backward(bi_context *ctx);
 void bi_opt_dead_code_eliminate(bi_context *ctx);
 void bi_opt_fuse_dual_texture(bi_context *ctx);
 void bi_opt_dce_post_ra(bi_context *ctx);
+void bi_opt_message_preload(bi_context *ctx);
 void bi_opt_push_ubo(bi_context *ctx);
 void bi_opt_reorder_push(bi_context *ctx);
 void bi_lower_swizzle(bi_context *ctx);
diff --git a/src/panfrost/bifrost/meson.build b/src/panfrost/bifrost/meson.build
index 1dcd9b572da..eda61e8421a 100644
--- a/src/panfrost/bifrost/meson.build
+++ b/src/panfrost/bifrost/meson.build
@@ -33,6 +33,7 @@ libpanfrost_bifrost_files = files(
   'bi_opt_dce.c',
   'bi_opt_cse.c',
   'bi_opt_push_ubo.c',
+  'bi_opt_message_preload.c',
   'bi_opt_mod_props.c',
   'bi_opt_dual_tex.c',
   'bi_pack.c',



More information about the mesa-commit mailing list