Mesa (master): aco: Note if rasterization can start early.

GitLab Mirror gitlab-mirror at kemper.freedesktop.org
Tue Jan 12 16:52:54 UTC 2021


Module: Mesa
Branch: master
Commit: 38da379b3eb99c65fa3b9e5eeb1361fde700dbd8
URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=38da379b3eb99c65fa3b9e5eeb1361fde700dbd8

Author: Timur Kristóf <timur.kristof at gmail.com>
Date:   Mon Jan 11 18:36:20 2021 +0100

aco: Note if rasterization can start early.

When there are no param exports in an NGG (or legacy VS) shader,
the NO_PC_EXPORT=1 is set by RADV, which means PS waves can launch
before the current stage finishes.

Signed-off-by: Timur Kristóf <timur.kristof at gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02 at gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7868>

---

 src/amd/compiler/README-ISA.md                       | 19 +++++++++++++++++++
 src/amd/compiler/aco_instruction_selection_setup.cpp |  9 +++++++++
 src/amd/compiler/aco_ir.h                            |  1 +
 3 files changed, 29 insertions(+)

diff --git a/src/amd/compiler/README-ISA.md b/src/amd/compiler/README-ISA.md
index 940d2728c07..678759ff25c 100644
--- a/src/amd/compiler/README-ISA.md
+++ b/src/amd/compiler/README-ISA.md
@@ -130,6 +130,25 @@ on what sort of addressing should be used, but it says that it
 "is equivalent to an `S_CBRANCH` with extra math", so the subvector loop handling
 in ACO is done according to the `s_cbranch` doc.
 
+## RDNA early rasterization
+
+The ISA documentation says about `s_endpgm`:
+
+> The hardware implicitly executes S_WAITCNT 0 and S_WAITCNT_VSCNT 0
+> before executing this instruction.
+
+What the doc doesn't say is that in case of NGG (and legacy VS) when there
+are no param exports, the driver sets `NO_PC_EXPORT=1` for optimal performance,
+and when this is set, the hardware will start clipping and rasterization
+as soon as it encounters a position export with `DONE=1`, without waiting
+for the NGG (or VS) to finish.
+
+It can even launch PS waves before NGG (or VS) ends.
+
+When this happens, any store performed by a VS is not guaranteed
+to be complete when PS tries to load it, so we need to manually
+make sure to insert wait instructions before the position exports.
+
 # Hardware Bugs
 
 ## SMEM corrupts VCCZ on SI/CI
diff --git a/src/amd/compiler/aco_instruction_selection_setup.cpp b/src/amd/compiler/aco_instruction_selection_setup.cpp
index 90a0bd9e3b0..e0c2b93d2b8 100644
--- a/src/amd/compiler/aco_instruction_selection_setup.cpp
+++ b/src/amd/compiler/aco_instruction_selection_setup.cpp
@@ -366,6 +366,15 @@ setup_vs_output_info(isel_context *ctx, nir_shader *nir,
       pos_written |= 1 << 3;
 
    outinfo->pos_exports = util_bitcount(pos_written);
+
+   /* GFX10+ early rasterization:
+    * When there are no param exports in an NGG (or legacy VS) shader,
+    * RADV sets NO_PC_EXPORT=1, which means the HW will start clipping and rasterization
+    * as soon as it encounters a DONE pos export. When this happens, PS waves can launch
+    * before the NGG (or VS) waves finish.
+    */
+   ctx->program->early_rast = ctx->program->chip_class >= GFX10 &&
+                              outinfo->param_exports == 0;
 }
 
 void
diff --git a/src/amd/compiler/aco_ir.h b/src/amd/compiler/aco_ir.h
index a97927b2036..f8aab8cef67 100644
--- a/src/amd/compiler/aco_ir.h
+++ b/src/amd/compiler/aco_ir.h
@@ -1738,6 +1738,7 @@ public:
    bool xnack_enabled = false;
    bool sram_ecc_enabled = false;
    bool has_fast_fma32 = false;
+   bool early_rast = false; /* whether rasterization can start as soon as the 1st DONE pos export */
 
    bool needs_vcc = false;
    bool needs_flat_scr = false;



More information about the mesa-commit mailing list