Mesa (main): radeonsi: unroll loops of up to 128 iterations
GitLab Mirror
gitlab-mirror at kemper.freedesktop.org
Sat Dec 11 20:39:57 UTC 2021
Module: Mesa
Branch: main
Commit: 9ff086052ab7bff3cb55c06365543190a3afe188
URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=9ff086052ab7bff3cb55c06365543190a3afe188
Author: Marek Olšák <marek.olsak at amd.com>
Date: Sun Nov 28 04:55:47 2021 -0500
radeonsi: unroll loops of up to 128 iterations
It's not exactly 128 because longer loop bodies scale the number down.
This improves perf for VP13/Creo and Piano. Most other tests either didn't
show any difference or are CPU-bound.
v2:
- The lowering passes had to be moved to the optimization loop because unrolling creates lowerable variables.
- Piano has some pattern that looks like corruption and the pattern changed with loop unrolling.
The pattern is present on other drivers as well.
v3:
- I removed the Piano test from CI traces because the image is random. The output was wrong even before
this MR, and now it's randomly wrong.
| PERCENTAGE DELTAS | Shaders | SGPRs | VGPRs |SpillSGPR |SpillVGPR | PrivVGPR | Scratch | CodeSize | MaxWaves |
|------------------------|----------|----------|----------|----------|----------|----------|----------|----------|----------|
| alien_isolation | 2936| . | 0.02 %| . | . | . | . | 0.83 %| . |
| deadcore | 76| 18.47 %| . | . | . | . | . | 167.69 %| . |
| deus_ex_mankind_div.. | 1410| 0.10 %| 0.15 %| . | . | . | . | 1.70 %| . |
| f1-2015 | 775| 0.37 %| 0.16 %| . | . | . | . | 3.25 %| -0.07 %|
| hitman | 1413| 0.10 %| -0.03 %| 6.45 %| . | . | . | 0.61 %| 0.03 %|
| metro_2033_redux | 2670| . | . | . | . | . | . | 0.13 %| 0.01 %|
| pixmark-piano-0.7.0 | 2| . | 14.29 %| -100.00 %| . | . | . | 78.07 %| -4.76 %|
| reflections_subway | 98| -0.53 %| . | . | . | . | . | 7.64 %| . |
| thea | 172| 0.12 %| -0.81 %| . | . | . | . | 0.65 %| 0.15 %|
| ubershaders | 54| . | . | . | . | . | . | 61.13 %| . |
| ue4_effects_cave | 290| 0.05 %| . | . | . | . | . | 2.62 %| . |
| vp13-creo | 26| -3.38 %| -4.20 %| . | . | . | . | 88.56 %| 2.62 %|
| vp13-sw | 100| -0.36 %| -9.14 %| . | -100.00 %| . | -100.00 %| -17.97 %| 0.39 %|
| vp20-creo | 22| -0.82 %| -3.33 %| . | . | . | . | 81.59 %| 1.51 %|
| vp20-sw | 296| -4.51 %| -0.63 %| . | . | . | . | 58.93 %| 0.20 %|
|------------------------|----------|----------|----------|----------|----------|----------|----------|----------|----------|
| All affected | 189| 3.05 %| -2.87 %| 500.00 %| -100.00 %| . | -100.00 %| 135.61 %| 1.32 %|
|------------------------|----------|----------|----------|----------|----------|----------|----------|----------|----------|
| Total | 57794| 0.01 %| -0.02 %| 0.27 %| -3.13 %| . | -2.89 %| 1.73 %| . |
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer at amd.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13966>
---
src/gallium/drivers/radeonsi/ci/traces-radeonsi.yml | 4 ----
src/gallium/drivers/radeonsi/si_get.c | 2 +-
src/gallium/drivers/radeonsi/si_shader_nir.c | 8 ++++----
3 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/src/gallium/drivers/radeonsi/ci/traces-radeonsi.yml b/src/gallium/drivers/radeonsi/ci/traces-radeonsi.yml
index bd80fe70429..dc025be3a6d 100644
--- a/src/gallium/drivers/radeonsi/ci/traces-radeonsi.yml
+++ b/src/gallium/drivers/radeonsi/ci/traces-radeonsi.yml
@@ -34,10 +34,6 @@ traces:
expectations:
- device: gl-radeonsi-stoney
checksum: 84c499203944cdc59e70450c324bb8df
- - path: gputest/pixmark-piano.trace
- expectations:
- - device: gl-radeonsi-stoney
- checksum: a7317d54d452d19ce630c7f554f2279b
- path: gputest/triangle.trace
expectations:
- device: gl-radeonsi-stoney
diff --git a/src/gallium/drivers/radeonsi/si_get.c b/src/gallium/drivers/radeonsi/si_get.c
index e4a32c35bb9..b4064cd8f0b 100644
--- a/src/gallium/drivers/radeonsi/si_get.c
+++ b/src/gallium/drivers/radeonsi/si_get.c
@@ -1054,7 +1054,7 @@ void si_init_screen_get_functions(struct si_screen *sscreen)
.has_dot_4x8 = sscreen->info.has_accelerated_dot_product,
.has_dot_2x16 = sscreen->info.has_accelerated_dot_product,
.optimize_sample_mask_in = true,
- .max_unroll_iterations = 32,
+ .max_unroll_iterations = 128,
.max_unroll_iterations_aggressive = 128,
.use_interpolated_input_intrinsics = true,
.lower_uniforms_to_ubo = true,
diff --git a/src/gallium/drivers/radeonsi/si_shader_nir.c b/src/gallium/drivers/radeonsi/si_shader_nir.c
index f51909cf079..a3e49d8bec5 100644
--- a/src/gallium/drivers/radeonsi/si_shader_nir.c
+++ b/src/gallium/drivers/radeonsi/si_shader_nir.c
@@ -597,15 +597,15 @@ void si_nir_opts(struct si_screen *sscreen, struct nir_shader *nir, bool first)
{
bool progress;
- NIR_PASS_V(nir, nir_lower_vars_to_ssa);
- NIR_PASS_V(nir, nir_lower_alu_to_scalar, si_alu_to_scalar_filter, sscreen);
- NIR_PASS_V(nir, nir_lower_phis_to_scalar, false);
-
do {
progress = false;
bool lower_alu_to_scalar = false;
bool lower_phis_to_scalar = false;
+ NIR_PASS(progress, nir, nir_lower_vars_to_ssa);
+ NIR_PASS(progress, nir, nir_lower_alu_to_scalar, si_alu_to_scalar_filter, sscreen);
+ NIR_PASS(progress, nir, nir_lower_phis_to_scalar, false);
+
if (first) {
NIR_PASS(progress, nir, nir_split_array_vars, nir_var_function_temp);
NIR_PASS(lower_alu_to_scalar, nir, nir_shrink_vec_array_vars, nir_var_function_temp);
More information about the mesa-commit
mailing list