Mesa (main): intel/fs: Better handle constant sources of FS_OPCODE_PACK_HALF_2x16_SPLIT

GitLab Mirror gitlab-mirror at kemper.freedesktop.org
Thu Apr 7 18:45:52 UTC 2022


Module: Mesa
Branch: main
Commit: b5fa43952aaf7a41385402ad2d21af5a17805615
URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=b5fa43952aaf7a41385402ad2d21af5a17805615

Author: Ian Romanick <ian.d.romanick at intel.com>
Date:   Mon Feb 14 14:07:18 2022 -0800

intel/fs: Better handle constant sources of FS_OPCODE_PACK_HALF_2x16_SPLIT

I noticed that a *LOT* of fragment shaders in Shadow of the Tomb Raider,
for instance, end up with a sequence of NIR like:

    vec1 32 ssa_2 = load_const (0x00000000 = 0.000000)
    ...
    vec1 32 ssa_191 = pack_half_2x16_split ssa_188, ssa_2
    vec1 32 ssa_192 = pack_half_2x16_split ssa_189, ssa_2
    vec1 32 ssa_193 = pack_half_2x16_split ssa_190, ssa_2

This results in an assembly sequence like:

    mov(8)          g28<1>UD        0x00000000UD
    mov(8)          g21<2>HF        g28<8,8,1>F
    shl(8)          g21<1>UD        g21<8,8,1>UD    0x00000010UD
    mov(8)          g21<2>HF        g25<8,8,1>F
    mov(8)          g19<2>HF        g28<8,8,1>F
    shl(8)          g19<1>UD        g19<8,8,1>UD    0x00000010UD
    mov(8)          g19<2>HF        g23<8,8,1>F
    mov(8)          g20<2>HF        g28<8,8,1>F
    shl(8)          g20<1>UD        g20<8,8,1>UD    0x00000010UD
    mov(8)          g20<2>HF        g24<8,8,1>F

After this commit, the generated assembly is:

    mov(8)          g21<1>UD        0x00000000UD
    mov(8)          g21<2>HF        g23<8,8,1>F
    mov(8)          g19<1>UD        0x00000000UD
    mov(8)          g19<2>HF        g17<8,8,1>F
    mov(8)          g20<1>UD        0x00000000UD
    mov(8)          g20<2>HF        g18<8,8,1>F

Tiger Lake, Ice Lake, Skylake, and Haswell had similar results. (Ice Lake shown)
total instructions in shared programs: 20119086 -> 20119034 (<.01%)
instructions in affected programs: 9056 -> 9004 (-0.57%)
helped: 8
HURT: 0
helped stats (abs) min: 2 max: 16 x̄: 6.50 x̃: 4
helped stats (rel) min: 0.29% max: 1.75% x̄: 1.00% x̃: 0.98%
95% mean confidence interval for instructions value: -11.01 -1.99
95% mean confidence interval for instructions %-change: -1.56% -0.44%
Instructions are helped.

total cycles in shared programs: 861019414 -> 861021044 (<.01%)
cycles in affected programs: 279862 -> 281492 (0.58%)
helped: 4
HURT: 2
helped stats (abs) min: 6 max: 936 x̄: 239.00 x̃: 7
helped stats (rel) min: 0.03% max: 8.13% x̄: 2.09% x̃: 0.09%
HURT stats (abs)   min: 18 max: 2568 x̄: 1293.00 x̃: 1293
HURT stats (rel)   min: 0.36% max: 1.14% x̄: 0.75% x̃: 0.75%
95% mean confidence interval for cycles value: -972.56 1515.89
95% mean confidence interval for cycles %-change: -4.77% 2.49%
Inconclusive result (value mean confidence interval includes 0).

Broadwell
total instructions in shared programs: 17812327 -> 17812263 (<.01%)
instructions in affected programs: 9867 -> 9803 (-0.65%)
helped: 8
HURT: 0
helped stats (abs) min: 2 max: 28 x̄: 8.00 x̃: 4
helped stats (rel) min: 0.32% max: 1.80% x̄: 1.00% x̃: 0.95%
95% mean confidence interval for instructions value: -15.46 -0.54
95% mean confidence interval for instructions %-change: -1.54% -0.47%
Instructions are helped.

total cycles in shared programs: 904768620 -> 904773291 (<.01%)
cycles in affected programs: 454799 -> 459470 (1.03%)
helped: 4
HURT: 4
helped stats (abs) min: 36 max: 586 x̄: 344.50 x̃: 378
helped stats (rel) min: 0.47% max: 4.04% x̄: 2.01% x̃: 1.77%
HURT stats (abs)   min: 1 max: 5572 x̄: 1512.25 x̃: 238
HURT stats (rel)   min: <.01% max: 2.77% x̄: 1.46% x̃: 1.53%
95% mean confidence interval for cycles value: -1122.40 2290.15
95% mean confidence interval for cycles %-change: -2.26% 1.71%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 18581 -> 18579 (-0.01%)
spills in affected programs: 323 -> 321 (-0.62%)
helped: 1
HURT: 0

total fills in shared programs: 24985 -> 24981 (-0.02%)
fills in affected programs: 1348 -> 1344 (-0.30%)
helped: 1
HURT: 0

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 143585431 -> 143513657 (-0.0%)
Instructions helped: 14403

Cycles in all programs: 8439312778 -> 8439371578 (+0.0%)
Cycles helped: 10570
Cycles hurt: 3290

Gained: 146
Lost: 74

All of the lost and gained fossil-db shaders are SIMD32 fragment
shaders.  14,247 of the affected shaders are from Shadow of the Tomb
Raider.  154 are from Batman Arkham Origins, and the remaining two are
from Octopath Traveler.

Reviewed-by: Matt Turner <mattst88 at gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15089>

---

 src/intel/compiler/brw_fs_copy_propagation.cpp |  5 +++++
 src/intel/compiler/brw_fs_generator.cpp        | 26 ++++++++++++++++----------
 2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp b/src/intel/compiler/brw_fs_copy_propagation.cpp
index 85013dfd704..dd7dc16dc4f 100644
--- a/src/intel/compiler/brw_fs_copy_propagation.cpp
+++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
@@ -934,6 +934,11 @@ fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry *entry)
          progress = true;
          break;
 
+      case FS_OPCODE_PACK_HALF_2x16_SPLIT:
+         inst->src[i] = val;
+         progress = true;
+         break;
+
       default:
          break;
       }
diff --git a/src/intel/compiler/brw_fs_generator.cpp b/src/intel/compiler/brw_fs_generator.cpp
index 5aa47a0a8b8..4ca3e4cc9e9 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -31,6 +31,7 @@
 #include "brw_fs.h"
 #include "brw_cfg.h"
 #include "util/mesa-sha1.h"
+#include "util/half_float.h"
 
 static enum brw_reg_file
 brw_file_from_reg(fs_reg *reg)
@@ -1782,17 +1783,22 @@ fs_generator::generate_pack_half_2x16_split(fs_inst *,
       ? BRW_REGISTER_TYPE_HF : BRW_REGISTER_TYPE_W;
    struct brw_reg dst_w = spread(retype(dst, t), 2);
 
-   /* Give each 32-bit channel of dst the form below, where "." means
-    * unchanged.
-    *   0x....hhhh
-    */
-   brw_F32TO16(p, dst_w, y);
+   if (y.file == IMM) {
+      const uint32_t hhhh0000 = _mesa_float_to_half(y.f) << 16;
+      brw_MOV(p, dst, brw_imm_ud(hhhh0000));
+   } else {
+      /* Give each 32-bit channel of dst the form below, where "." means
+       * unchanged.
+       *   0x....hhhh
+       */
+      brw_F32TO16(p, dst_w, y);
 
-   /* Now the form:
-    *   0xhhhh0000
-    */
-   brw_set_default_swsb(p, tgl_swsb_regdist(1));
-   brw_SHL(p, dst, dst, brw_imm_ud(16u));
+      /* Now the form:
+       *   0xhhhh0000
+       */
+      brw_set_default_swsb(p, tgl_swsb_regdist(1));
+      brw_SHL(p, dst, dst, brw_imm_ud(16u));
+   }
 
    /* And, finally the form of packHalf2x16's output:
     *   0xhhhhllll



More information about the mesa-commit mailing list