[Mesa-dev] [PATCH 03/15] i965/cfg: Rework to make IF & ELSE blocks flow into ENDIF.

Matt Turner mattst88 at gmail.com
Tue Dec 3 18:47:33 PST 2013


On Mon, Dec 2, 2013 at 10:40 AM, Matt Turner <mattst88 at gmail.com> wrote:
> Previously we made the basic block following an ENDIF instruction a
> successor of the basic blocks ending with IF and ELSE. The PRM says that
> IF and ELSE instructions jump *to* the ENDIF, rather than over it.
>
> This should be immaterial to dataflow analysis, except for if, break,
> endif sequences:
>
>    START B1 <-B0 <-B9
> 0x00000100: cmp.g.f0(8)     null            g15<8,8,1>F     g4<0,1,0>F
> 0x00000110: (+f0) if(8) 0 0                 null            0x00000000UD
>    END B1 ->B2 ->B4
>    START B2 <-B1
>    break
> 0x00000120: break(8) 0 0                    null            0D
>    END B2 ->B10
>    START B3
> 0x00000130: endif(8) 2                      null            0x00000002UD
>    END B3 ->B4
>
> The ENDIF block would have no parents, so dataflow analysis would
> generate incorrect results, preventing copy propagation from eliminating
> some instructions.
>
> This patch changes the CFG to make ENDIF start rather than end basic
> blocks, so that it can be the jump target of the IF and ELSE
> instructions.
>
> It helps three programs (including two fs8/fs16 pairs) and hurts a
> single fs8/fs16 pair.
>
> total instructions in shared programs: 1561126 -> 1561066 (-0.00%)
> instructions in affected programs:     983 -> 923 (-6.10%)
>
> More importantly, it allows copy propagation to handle more cases.
> Disabling the register_coalesce() pass before this patch hurts 58
> programs, while afterward it only hurts 11 programs.
> ---

Eric mentioned on IRC that he was concerned about this patch and that
it may be papering over some strange behavior.

The background is that while investigating a regression caused by my
value numbering pass, I found two raw MOVs that should have been copy
propagated. The shader (sam3/101.frag) has a loop with an
if/break/endif sequence. In attempting to reduce the amount of effort
to debug copy propagation, I applied my predicated break patch and the
MOVs were properly copy propagated and removed.

In fact, this patch is an attempt to fix the real problem before
committing the predicated break pass which itself would paper over the
problem. :)

The attached diff is the output of

diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
index 26bac94..ff68519 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
@@ -560,6 +560,9 @@ fs_visitor::opt_copy_propagate()
       progress = opt_copy_propagate_local(mem_ctx, block, in_acp) || progress;
    }

+   cfg.dump(this);
+   dataflow.dump_block_data();
+
    for (int i = 0; i < cfg.num_blocks; i++)
       delete [] out_acp[i];
    ralloc_free(mem_ctx);

during the first call of opt_copy_propagate(), before and after this patch.

The diff shows that B6 in the before case has no parents, and its
livein/liveout sets are empty. If I understand the dataflow algorithm
correctly, this is a problem. liveout is calculated by ANDing livein,
and if the block incorrectly has no parents, livein will be 0 rather
than ~0. Its liveout is then used to compute the livein of other
blocks, so the problem cascades. I think this can be seen looking at
B6's successors B7 and (indirectly through B7) B4 which have empty
livein and empty or wrong liveout.

We could avoid this problem by modifying the dataflow analysis code to
assign only B0 an empty livein set, but *that* feels like papering it
over. :)

Another concern was about ENDIF starting, rather than ending basic
blocks. I think this concern has been alleviated. ENDIF doesn't jump
itself (except to the next instruction; and in one other case that we
don't do) and is rather the jump target of the IF and ELSE
instructions, so it's correct to have it start a basic block.
-------------- next part --------------
--- dataflow-before	2013-12-03 18:16:25.069616345 -0800
+++ dataflow-after	2013-12-03 18:17:32.146610248 -0800
@@ -1,215 +1,211 @@
 START B0
     0: mov vgrf5, u0, (null), (null),  
     1: mov vgrf103, u1, (null), (null),  
     2: mov vgrf104, u2, (null), (null),  
     3: mov vgrf105, u3, (null), (null),  
     4: mov vgrf2, u2, (null), (null),  
     5: mov vgrf4, u4, (null), (null),  
     6: mov vgrf100, u5, (null), (null),  
     7: mov vgrf101, u6, (null), (null),  
     8: mov vgrf102, u7, (null), (null),  
     9: mov vgrf3, u4, (null), (null),  
    10: mov vgrf18, vgrf1, (null), (null),  
    11: mov vgrf109, vgrf99, (null), (null),  
    12: mov vgrf19, vgrf1, (null), (null),  
    13: mov vgrf19+1, vgrf99, (null), (null),  
    14: tex vgrf17, vgrf19, (null), (null),  
    15: mov vgrf20, 65504.000000f, (null), (null),  
    16: sel.cmod vgrf13, vgrf17, 65504.000000f, (null),  
    17: mov vgrf21, 65504.000000f, (null), (null),  
    18: sel.cmod vgrf14, vgrf17+1, 65504.000000f, (null),  
    19: mov vgrf22, 65504.000000f, (null), (null),  
    20: sel.cmod vgrf15, vgrf17+2, 65504.000000f, (null),  
    21: mov vgrf23, 65504.000000f, (null), (null),  
    22: sel.cmod vgrf16, vgrf17+3, 65504.000000f, (null),  
    23: mov vgrf9, u0, (null), (null),  
    24: mov vgrf10, u1, (null), (null),  
    25: mov vgrf11, -u0, (null), (null),  
    26: mov vgrf12, -u1, (null), (null),  
    27: mov vgrf24, vgrf1, (null), (null),  
    28: mov vgrf110, vgrf99, (null), (null),  
    29: mov vgrf111, vgrf1, (null), (null),  
    30: mov vgrf112, vgrf99, (null), (null),  
    31: mov vgrf8, vgrf1, (null), (null),  
    32: mov vgrf106, vgrf99, (null), (null),  
    33: mov vgrf107, vgrf1, (null), (null),  
    34: mov vgrf108, vgrf99, (null), (null),  
    35: mov vgrf7, 1.000000f, (null), (null),  
    36: mov vgrf6, 1d, (null), (null),  
    37: mov vgrf26, 1d, (null), (null),  
    38: mov vgrf25, 1d, (null), (null),  
    39: do (null), (null), (null), (null),  
 END B0 ->B1
 START B1 <-B0 <-B2
    40: mov vgrf27, 25d, (null), (null),  
    41: cmp.cmod.f0.0 hw_reg0, vgrf25, 25d, (null),  
    42: (+f0.0) break (null), (null), (null), (null),  
 END B1 ->B3 ->B2
 START B2 <-B1
    43: add vgrf29, vgrf8, u0, (null),  
    44: mov vgrf28, vgrf29, (null), (null),  
    45: add vgrf30, vgrf106, u1, (null),  
    46: mov vgrf113, vgrf30, (null), (null),  
    47: add vgrf31, vgrf107, -u0, (null),  
    48: mov vgrf114, vgrf31, (null), (null),  
    49: add vgrf32, vgrf108, -u1, (null),  
    50: mov vgrf115, vgrf32, (null), (null),  
    51: mov vgrf8, vgrf29, (null), (null),  
    52: mov vgrf106, vgrf30, (null), (null),  
    53: mov vgrf107, vgrf31, (null), (null),  
    54: mov vgrf108, vgrf32, (null), (null),  
    55: mov vgrf34, vgrf29, (null), (null),  
    56: mov vgrf116, vgrf30, (null), (null),  
    57: mov vgrf35, vgrf29, (null), (null),  
    58: mov vgrf35+1, vgrf30, (null), (null),  
    59: tex vgrf33, vgrf35, (null), (null),  
    60: mov vgrf37, vgrf31, (null), (null),  
    61: mov vgrf117, vgrf32, (null), (null),  
    62: mov vgrf38, vgrf31, (null), (null),  
    63: mov vgrf38+1, vgrf32, (null), (null),  
    64: tex vgrf36, vgrf38, (null), (null),  
    65: mul vgrf40, vgrf7, vgrf7, (null),  
    66: mul vgrf41, vgrf40, u4, (null),  
    67: mov vgrf42, 1.442695f, (null), (null),  
    68: mul vgrf43, vgrf41, 1.442695f, (null),  
    69: exp2 vgrf39, vgrf43, (null), (null),  
    70: add vgrf44, vgrf33, vgrf36, (null),  
    71: mov vgrf45, 65504.000000f, (null), (null),  
    72: sel.cmod vgrf46, vgrf44, 65504.000000f, (null),  
    73: mad vgrf13, vgrf13, vgrf46, vgrf39,  
    74: add vgrf47, vgrf33+1, vgrf36+1, (null),  
    75: mov vgrf48, 65504.000000f, (null), (null),  
    76: sel.cmod vgrf49, vgrf47, 65504.000000f, (null),  
    77: mad vgrf14, vgrf14, vgrf49, vgrf39,  
    78: add vgrf50, vgrf33+2, vgrf36+2, (null),  
    79: mov vgrf51, 65504.000000f, (null), (null),  
    80: sel.cmod vgrf52, vgrf50, 65504.000000f, (null),  
    81: mad vgrf15, vgrf15, vgrf52, vgrf39,  
    82: add vgrf53, vgrf33+3, vgrf36+3, (null),  
    83: mov vgrf54, 65504.000000f, (null), (null),  
    84: sel.cmod vgrf55, vgrf53, 65504.000000f, (null),  
    85: mad vgrf16, vgrf16, vgrf55, vgrf39,  
    86: mov vgrf56, 1.000000f, (null), (null),  
    87: add vgrf7, vgrf7, 1.000000f, (null),  
    88: mov vgrf57, 1d, (null), (null),  
    89: add vgrf6, vgrf6, 1d, (null),  
    90: mov vgrf58, 1d, (null), (null),  
    91: add vgrf25, vgrf25, 1d, (null),  
    92: while (null), (null), (null), (null),  
 END B2 ->B1
 START B3 <-B1
    93: mov vgrf6, 25d, (null), (null),  
    94: do (null), (null), (null), (null),  
 END B3 ->B4
-START B4 <-B3 <-B7
+START B4 <-B3 <-B6
    95: mov vgrf59, vgrf6, (null), (null),  
-   96: cmp.cmod.f0.0 hw_reg0, vgrf59, vgrf2, (null),  
+   96: cmp.cmod.f0.0 hw_reg0, vgrf59, u2, (null),  
    97: (+f0.0) if (null), (null), (null), (null),  
-END B4 ->B5 ->B7
+END B4 ->B5 ->B6
 START B5 <-B4
    98: break (null), (null), (null), (null),  
-END B5 ->B8
-START B6
+END B5 ->B7
+START B6 <-B4
    99: endif (null), (null), (null), (null),  
-END B6 ->B7
-START B7 <-B6 <-B4
-  100: add vgrf61, vgrf8, vgrf9, (null),  
+  100: add vgrf61, vgrf8, u0, (null),  
   101: mov vgrf60, vgrf61, (null), (null),  
-  102: add vgrf62, vgrf106, vgrf10, (null),  
+  102: add vgrf62, vgrf106, u1, (null),  
   103: mov vgrf118, vgrf62, (null), (null),  
-  104: add vgrf63, vgrf107, vgrf11, (null),  
+  104: add vgrf63, vgrf107, -u0, (null),  
   105: mov vgrf119, vgrf63, (null), (null),  
-  106: add vgrf64, vgrf108, vgrf12, (null),  
+  106: add vgrf64, vgrf108, -u1, (null),  
   107: mov vgrf120, vgrf64, (null), (null),  
   108: mov vgrf8, vgrf61, (null), (null),  
   109: mov vgrf106, vgrf62, (null), (null),  
   110: mov vgrf107, vgrf63, (null), (null),  
   111: mov vgrf108, vgrf64, (null), (null),  
   112: mov vgrf66, vgrf61, (null), (null),  
   113: mov vgrf121, vgrf62, (null), (null),  
   114: mov vgrf67, 0.000000f, (null), (null),  
   115: mov vgrf68, 0.000000f, (null), (null),  
   116: mov vgrf68+1, vgrf61, (null), (null),  
   117: mov vgrf68+2, vgrf62, (null), (null),  
   118: txl vgrf65, vgrf68, (null), (null),  
   119: mov vgrf70, vgrf63, (null), (null),  
   120: mov vgrf122, vgrf64, (null), (null),  
   121: mov vgrf71, 0.000000f, (null), (null),  
   122: mov vgrf72, 0.000000f, (null), (null),  
   123: mov vgrf72+1, vgrf63, (null), (null),  
   124: mov vgrf72+2, vgrf64, (null), (null),  
   125: txl vgrf69, vgrf72, (null), (null),  
   126: mul vgrf74, vgrf7, vgrf7, (null),  
-  127: mul vgrf75, vgrf74, vgrf3, (null),  
+  127: mul vgrf75, vgrf74, u4, (null),  
   128: mov vgrf76, 1.442695f, (null), (null),  
   129: mul vgrf77, vgrf75, 1.442695f, (null),  
   130: exp2 vgrf73, vgrf77, (null), (null),  
   131: add vgrf78, vgrf65, vgrf69, (null),  
   132: mov vgrf79, 65504.000000f, (null), (null),  
   133: sel.cmod vgrf80, vgrf78, 65504.000000f, (null),  
   134: mad vgrf13, vgrf13, vgrf80, vgrf73,  
   135: add vgrf81, vgrf65+1, vgrf69+1, (null),  
   136: mov vgrf82, 65504.000000f, (null), (null),  
   137: sel.cmod vgrf83, vgrf81, 65504.000000f, (null),  
   138: mad vgrf14, vgrf14, vgrf83, vgrf73,  
   139: add vgrf84, vgrf65+2, vgrf69+2, (null),  
   140: mov vgrf85, 65504.000000f, (null), (null),  
   141: sel.cmod vgrf86, vgrf84, 65504.000000f, (null),  
   142: mad vgrf15, vgrf15, vgrf86, vgrf73,  
   143: add vgrf87, vgrf65+3, vgrf69+3, (null),  
   144: mov vgrf88, 65504.000000f, (null), (null),  
   145: sel.cmod vgrf89, vgrf87, 65504.000000f, (null),  
   146: mad vgrf16, vgrf16, vgrf89, vgrf73,  
   147: mov vgrf90, 1.000000f, (null), (null),  
   148: add vgrf7, vgrf7, 1.000000f, (null),  
   149: mov vgrf91, 1d, (null), (null),  
   150: add vgrf6, vgrf6, 1d, (null),  
   151: while (null), (null), (null), (null),  
-END B7 ->B4
-START B8 <-B5
-  152: mul vgrf92, vgrf13, vgrf100, (null),  
-  153: mul vgrf93, vgrf14, vgrf100, (null),  
-  154: mul vgrf94, vgrf15, vgrf100, (null),  
-  155: mul vgrf95, vgrf16, vgrf100, (null),  
+END B6 ->B4
+START B7 <-B5
+  152: mul vgrf92, vgrf13, u5, (null),  
+  153: mul vgrf93, vgrf14, u5, (null),  
+  154: mul vgrf94, vgrf15, u5, (null),  
+  155: mul vgrf95, vgrf16, u5, (null),  
   156: mov vgrf13, vgrf92, (null), (null),  
   157: mov vgrf14, vgrf93, (null), (null),  
   158: mov vgrf15, vgrf94, (null), (null),  
   159: mov vgrf16, vgrf95, (null), (null),  
   160: mov vgrf0, vgrf92, (null), (null),  
   161: mov vgrf96, vgrf93, (null), (null),  
   162: mov vgrf97, vgrf94, (null), (null),  
   163: mov vgrf98, vgrf95, (null), (null),  
   164: placeholder_halt (null), (null), (null), (null),  
   165: mov.sat m1, vgrf92, (null), (null),  
   166: mov.sat m2, vgrf93, (null), (null),  
   167: mov.sat m3, vgrf94, (null), (null),  
   168: mov.sat m4, vgrf95, (null), (null),  
   169: fb_write (null), (null), (null), (null),  
-END B8
+END B7
 Block 0 [0, 39] (parents ):
        livein = 0x000000000000000000000000, liveout = 0xffffffff0000000300000000,
        copy   = 0xffffffff0000000300000000, kill    = 0xffffffff0a92000370924100
 Block 1 [40, 42] (parents 0 2 ):
        livein = 0xad6dbfff0000000300000000, liveout = 0xad6dbfff0000000700000000,
        copy   = 0x000000000000000400000000, kill    = 0x000000000000000400000000
 Block 2 [43, 92] (parents 1 ):
        livein = 0xad6dbfff0000000700000000, liveout = 0xad6dbfff07ffffff00000000,
        copy   = 0x0000000007fffff800000000, kill    = 0x529240000ffffff870924100
 Block 3 [93, 94] (parents 1 ):
        livein = 0xad6dbfff0000000700000000, liveout = 0xad6dbfff0800000700000000,
        copy   = 0x000000000800000000000000, kill    = 0x000040000800000000000000
-Block 4 [95, 97] (parents 3 7 ):
-       livein = 0x000000000000000000000000, liveout = 0x000000000000000000000000,
+Block 4 [95, 97] (parents 3 6 ):
+       livein = 0xad6dbfff0000000700000000, liveout = 0xad6dbfff0000000700000000,
        copy   = 0x000000000000000000000000, kill    = 0x000000000000000000000000
 Block 5 [98, 98] (parents 4 ):
-       livein = 0x000000000000000000000000, liveout = 0x000000000000000000000000,
+       livein = 0xad6dbfff0000000700000000, liveout = 0xad6dbfff0000000700000000,
        copy   = 0x000000000000000000000000, kill    = 0x000000000000000000000000
-Block 6 [99, 99] (parents ):
-       livein = 0x000000000000000000000000, liveout = 0x000000000000000000000000,
-       copy   = 0x000000000000000000000000, kill    = 0x000000000000000000000000
-Block 7 [100, 151] (parents 6 4 ):
-       livein = 0x000000000000000000000000, liveout = 0x00000000f0000000007fffff,
+Block 6 [99, 151] (parents 4 ):
+       livein = 0xad6dbfff0000000700000000, liveout = 0xad6dbffff0000007007fffff,
        copy   = 0x00000000f0000000007fffff, kill    = 0x52124000fa92000070ffffff
-Block 8 [152, 170] (parents 5 ):
-       livein = 0x000000000000000000000000, liveout = 0x00000000000000007f800000,
+Block 7 [152, 170] (parents 5 ):
+       livein = 0xad6dbfff0000000700000000, liveout = 0xad6dbfff000000077f800000,
        copy   = 0x00000000000000007f800000, kill    = 0x00000000000000007f800000
+


More information about the mesa-dev mailing list