[Mesa-dev] [PATCH 5/6] i965/fs: Allow Boolean conditions in CSEL generation

Ian Romanick idr at freedesktop.org
Mon Jun 25 17:13:36 UTC 2018


From: Ian Romanick <ian.d.romanick at intel.com>

This can increase register pressure.  I think there may be some ways to
mitigate this, but that will take more work.

By percentage of instructions reduced, the most helped shader had a
block of code like:

    cmp.z.f0(8)     g51<1>D     g21<8,8,1>D     1D
    (+f0) sel(8)    g23<1>UD    g4.1<0,1,0>UD   g4<0,1,0>UD
    cmp.z.f0(8)     g24<1>D     g21<8,8,1>D     2D
    (+f0) sel(8)    g25<1>UD    g4.2<0,1,0>UD   g23<8,8,1>UD
    cmp.z.f0(8)     g26<1>D     g21<8,8,1>D     3D
    (+f0) sel(8)    g17<1>UD    g4.3<0,1,0>UD   g25<8,8,1>UD
    cmp.nz.f0(8)    null<1>D    g51<8,8,1>D     0D
    (+f0) sel(8)    g28<1>UD    g4.5<0,1,0>UD   g4.4<0,1,0>UD
    cmp.nz.f0(8)    null<1>D    g24<8,8,1>D     0D
    (+f0) sel(8)    g29<1>UD    g4.6<0,1,0>UD   g28<8,8,1>UD
    cmp.nz.f0(8)    null<1>D    g26<8,8,1>D     0D
    (+f0) sel(8)    g18<1>UD    g4.7<0,1,0>UD   g29<8,8,1>UD
    cmp.nz.f0(8)    null<1>D    g51<8,8,1>D     0D
    (+f0) sel(8)    g31<1>UD    g5.1<0,1,0>UD   g5<0,1,0>UD
    cmp.nz.f0(8)    null<1>D    g24<8,8,1>D     0D
    (+f0) sel(8)    g32<1>UD    g5.2<0,1,0>UD   g31<8,8,1>UD
    cmp.nz.f0(8)    null<1>D    g26<8,8,1>D     0D
    (+f0) sel(8)    g19<1>UD    g5.3<0,1,0>UD   g32<8,8,1>UD

All of the cmp.nz instructions could be eliminated by just moving the
selects.

Interestingly, this shader ends up looking like:

    cmp.z.f0(8)     g51<1>D     g28<8,8,1>D     1D
    (+f0) sel(8)    g32<1>UD    g4.1<0,1,0>UD   g4<0,1,0>UD
    csel.nz(8)      g21<1>F     g4.5<0,1,0>F    g4.4<0,1,0>F    g51<4,4,1>F
    csel.nz(8)      g31<1>F     g5.1<0,1,0>F    g5.0<0,1,0>F    g51<4,4,1>F
    cmp.z.f0(8)     g24<1>D     g28<8,8,1>D     2D
    (+f0) sel(8)    g25<1>UD    g4.2<0,1,0>UD   g32<8,8,1>UD
    csel.nz(8)      g29<1>F     g4.6<0,1,0>F    g21<4,4,1>F     g24<4,4,1>F
    csel.nz(8)      g23<1>F     g5.2<0,1,0>F    g31<4,4,1>F     g24<4,4,1>F
    cmp.z.f0(8)     g26<1>D     g28<8,8,1>D     3D
    (+f0) sel(8)    g17<1>UD    g4.3<0,1,0>UD   g25<8,8,1>UD
    csel.nz(8)      g18<1>F     g4.7<0,1,0>F    g29<4,4,1>F     g26<4,4,1>F
    csel.nz(8)      g19<1>F     g5.3<0,1,0>F    g23<4,4,1>F     g26<4,4,1>F

At this point, we may as well convert the CSELs back to regular SELs.

Broadwell and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14398211 -> 14395514 (-0.02%)
instructions in affected programs: 312588 -> 309891 (-0.86%)
helped: 1173
HURT: 1
helped stats (abs) min: 1 max: 66 x̄: 4.04 x̃: 3
helped stats (rel) min: 0.13% max: 13.95% x̄: 1.80% x̃: 1.36%
HURT stats (abs)   min: 2046 max: 2046 x̄: 2046.00 x̃: 2046
HURT stats (rel)   min: 33.71% max: 33.71% x̄: 33.71% x̃: 33.71%
95% mean confidence interval for instructions value: -5.73 1.14
95% mean confidence interval for instructions %-change: -1.87% -1.67%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 532959388 -> 532813589 (-0.03%)
cycles in affected programs: 4824963 -> 4679164 (-3.02%)
helped: 963
HURT: 175
helped stats (abs) min: 1 max: 18800 x̄: 175.14 x̃: 85
helped stats (rel) min: 0.02% max: 57.14% x̄: 7.96% x̃: 5.77%
HURT stats (abs)   min: 1 max: 13496 x̄: 130.63 x̃: 14
HURT stats (rel)   min: 0.05% max: 60.02% x̄: 2.82% x̃: 1.20%
95% mean confidence interval for cycles value: -178.35 -77.89
95% mean confidence interval for cycles %-change: -6.83% -5.77%
Cycles are helped.

total spills in shared programs: 8044 -> 8329 (3.54%)
spills in affected programs: 363 -> 648 (78.51%)
helped: 0
HURT: 1

total fills in shared programs: 10950 -> 11272 (2.94%)
fills in affected programs: 512 -> 834 (62.89%)
helped: 0
HURT: 1

LOST:   2
GAINED: 2

No changes on pre-Gen8 platforms because they all lack the CSEL
instruction.

Signed-off-by: Ian Romanick <ian.d.romanick at intel.com>
---

There is still quite a bit of potential future work here.

A lot of the cases that were helped by this were things like

    cmp.z.f0     g51F     g21F     g20F
    ... lots of instructions
    cmp.nz.f0    nullD    g51D     0D
    (+f0) sel    g28UD    g45UD    g44UD

If this pattern is the only use of the result from the first compare, we
could convert this to

    add.z.f0     g51F     -g21F    g20F
    ... lots of instructions
    csel.nz      g28F     g45F     g44F     g51F

I also saw a number of cases like

    and          g51D     g21D     7D
    ...
    cmp.nz.f0    nullD    g51D     0D
    (+f0) sel    g28UD    g45UD    g44UD

This pass does not convert those to use CSEL, but, assuming we can
determine the denorm behavior, it seems like it could.

As alluded in the commit message, we may also want a pass the converts
CSEL instructions back into SEL instructions.  This may help with
register pressure.

 src/intel/compiler/brw_fs.cpp | 56 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 51 insertions(+), 5 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index a4086a8dd7f..f683b4fd4d0 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -2903,13 +2903,54 @@ fs_visitor::opt_peephole_csel()
                 scan_inst->predicate != BRW_PREDICATE_NONE ||
                 (scan_inst->src[0].file != VGRF &&
                  scan_inst->src[0].file != ATTR &&
-                 scan_inst->src[0].file != UNIFORM) ||
-                scan_inst->src[0].type != BRW_REGISTER_TYPE_F)
+                 scan_inst->src[0].file != UNIFORM))
                break;
 
             if (scan_inst->opcode == BRW_OPCODE_CMP && !scan_inst->src[1].is_zero())
                break;
 
+            if (scan_inst->opcode == BRW_OPCODE_MOV &&
+                scan_inst->src[0].type != BRW_REGISTER_TYPE_F)
+               break;
+
+            /* If the comparison is among integer values, it can still work if
+             * the value being compared is itself the result of a comparison.
+             * Such a result will either be 0 or 0xffffffff.  The latter is
+             * -NaN, so can only be used with CSEL.Z or CSEL.NZ.
+             */
+            bool cond_is_bool = false;
+            if (scan_inst->src[0].type != BRW_REGISTER_TYPE_F) {
+               assert(scan_inst->opcode == BRW_OPCODE_CMP);
+
+               if (scan_inst->conditional_mod != BRW_CONDITIONAL_Z &&
+                   scan_inst->conditional_mod != BRW_CONDITIONAL_NZ)
+                  break;
+
+               if (scan_inst->src[0].abs || scan_inst->src[0].negate)
+                  break;
+
+               foreach_inst_in_block_reverse_starting_from(fs_inst,
+                                                           scan_more_inst,
+                                                           scan_inst) {
+                  if (regions_overlap(scan_more_inst->dst,
+                                      scan_more_inst->size_written,
+                                      scan_inst->src[0],
+                                      scan_inst->size_read(0))) {
+                     if (scan_more_inst->is_partial_write() ||
+                         scan_more_inst->dst.offset != scan_inst->src[0].offset ||
+                         scan_more_inst->exec_size != scan_inst->exec_size ||
+                         scan_more_inst->opcode != BRW_OPCODE_CMP)
+                        break;
+
+                     cond_is_bool = true;
+                     break;
+                  }
+               }
+
+               if (!cond_is_bool)
+                  break;
+            }
+
             const brw::fs_builder ibld(this, block, inst);
 
             const enum brw_conditional_mod cond =
@@ -2923,7 +2964,8 @@ fs_visitor::opt_peephole_csel()
                csel_inst = ibld.CSEL(inst->dst,
                                      inst->src[0],
                                      inst->src[1],
-                                     scan_inst->src[0],
+                                     retype(scan_inst->src[0],
+                                            BRW_REGISTER_TYPE_F),
                                      cond);
             } else if (cond == BRW_CONDITIONAL_NZ) {
                /* Consider the sequence
@@ -2945,10 +2987,14 @@ fs_visitor::opt_peephole_csel()
                csel_inst = ibld.CSEL(inst->dst,
                                      inst->src[0],
                                      scan_inst->src[0],
-                                     scan_inst->src[0],
+                                     retype(scan_inst->src[0],
+                                            BRW_REGISTER_TYPE_F),
                                      cond);
 
-               csel_inst->src[1].abs = true;
+               /* If the condition is a Boolean, then it is either 0 or ~0.
+                * The abs is unnecessary in this case.
+                */
+               csel_inst->src[1].abs = !cond_is_bool;
             }
 
             if (csel_inst != NULL) {
-- 
2.14.4



More information about the mesa-dev mailing list