Mesa (main): aco: Consider LDS usage by PS inputs in MaxWaves calculation.

Fri Aug 27 16:59:35 UTC 2021

Module: Mesa
Branch: main
Commit: c8698199a1134f1ac3849b4cb88cfa1250f6316a
URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=c8698199a1134f1ac3849b4cb88cfa1250f6316a

Author: Timur Kristóf <timur.kristof at gmail.com>
Date:   Tue Aug 24 08:44:54 2021 +0200

aco: Consider LDS usage by PS inputs in MaxWaves calculation.

Before PS waves are launched, PS inputs are moved from PC to LDS
and the corresponding part of the PC is deallocated.
Each PS input occupies 3 * vec4 (3 * 16 = 48 bytes) of LDS space.
See Figure 10.3 in the GCN3 ISA manual.

These limit occupancy the same way as other stages' LDS usage does.
Note that PS can request additional LDS space via EXTRA_LDS_SIZE,
so that also must be taken into account here.

No Fossil DB changes.

Signed-off-by: Timur Kristóf <timur.kristof at gmail.com>
Reviewed-by: Daniel Schürmann <daniel at schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12517>

---

 src/amd/compiler/aco_live_var_analysis.cpp | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/src/amd/compiler/aco_live_var_analysis.cpp b/src/amd/compiler/aco_live_var_analysis.cpp
index 0e94118a145..5031273d7fe 100644
--- a/src/amd/compiler/aco_live_var_analysis.cpp
+++ b/src/amd/compiler/aco_live_var_analysis.cpp
@@ -375,11 +375,25 @@ update_vgpr_sgpr_demand(Program* program, const RegisterDemand new_demand)
       /* adjust max_waves for workgroup and LDS limits */
       unsigned waves_per_workgroup = calc_waves_per_workgroup(program);
       unsigned workgroups_per_cu_wgp = max_waves_per_simd * simd_per_cu_wgp / waves_per_workgroup;
-      if (program->config->lds_size) {
-         unsigned lds = program->config->lds_size * program->dev.lds_encoding_granule;
-         lds = align(lds, program->dev.lds_alloc_granule);
-         workgroups_per_cu_wgp = std::min(workgroups_per_cu_wgp, lds_limit / lds);
+
+      unsigned lds_per_workgroup =
+         align(program->config->lds_size * program->dev.lds_encoding_granule,
+               program->dev.lds_alloc_granule);
+
+      if (program->stage == fragment_fs) {
+         /* PS inputs are moved from PC (parameter cache) to LDS before PS waves are launched.
+          * Each PS input occupies 3x vec4 of LDS space. See Figure 10.3 in GCN3 ISA manual.
+          * These limit occupancy the same way as other stages' LDS usage does.
+          */
+         unsigned lds_bytes_per_interp = 3 * 16;
+         unsigned lds_param_bytes = lds_bytes_per_interp * program->info->ps.num_interp;
+         lds_per_workgroup +=
+            align(lds_param_bytes, program->dev.lds_alloc_granule);
       }
+
+      if (lds_per_workgroup)
+         workgroups_per_cu_wgp = std::min(workgroups_per_cu_wgp, lds_limit / lds_per_workgroup);
+
       if (waves_per_workgroup > 1 && program->chip_class < GFX10)
          workgroups_per_cu_wgp = std::min(
             workgroups_per_cu_wgp, 16u); /* TODO: is this a SI-only limit? what about Navi? */