[Mesa-dev] [PATCH 3/4] intel/cfg: Always add both successors to a break

Connor Abbott cwabbott0 at gmail.com
Thu Oct 5 00:29:43 UTC 2017


This won't completely solve the problem. For example, what if you
hoist the assignment to color2 outside the loop?

vec4 color2;
while (1) {
   vec4 color = texture();
   color2 = color * 2;
   if (...) {
      break;
   }
}
gl_FragColor = color2;


Now the definition still dominates the use, even with the modified
control-flow graph, and you have the same problem. The real problem is
that the assignment to color2 is really a conditional assignment: if
we're going channel-by-channel, it's not, but if you consider the
*whole* register at the same time, it is. To really fix the problem,
you need to model exactly what the machine actually does: you need to
insert "fake" edges like these, that model the jumps that the machine
can take, and you need to make every assignment a conditional
assignment (i.e. it doesn't kill the register). It's probably not as
bad with Curro's patch on top, though. Also, once you do this you can
make register allocation more accurate by generating interferences
from the liveness information directly instead of from the intervals.

One thing I've thought about is, in addition to maintaining this
"whole-vector" view of things, is to maintain a "per-channel" liveness
that doesn't use the extra edges, partial definitions etc. and then
use the "per-channel view" to calculate interference when the channels
always line up.


On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> Shader-db results on Sky Lake:
>
>     total instructions in shared programs: 12955125 -> 12953698 (-0.01%)
>     instructions in affected programs: 55956 -> 54529 (-2.55%)
>     helped: 6
>     HURT: 38
>
> All of the hurt programs were hurt by exactly one instruction because
> this patch affects copy propagation.  Most of the helped instructions
> came from a single orbital explorer shader that was helped by 14.26%
>
> Cc: mesa-stable at lists.freedesktop.org
> ---
>  src/intel/compiler/brw_cfg.cpp | 37 +++++++++++++++++++++++++++++++++++--
>  1 file changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_cfg.cpp b/src/intel/compiler/brw_cfg.cpp
> index fad12ee..d8bf725 100644
> --- a/src/intel/compiler/brw_cfg.cpp
> +++ b/src/intel/compiler/brw_cfg.cpp
> @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions)
>           assert(cur_while != NULL);
>          cur->add_successor(mem_ctx, cur_while);
>
> +         /* We also add the next block as a successor of the break.  If the
> +          * break is predicated, we need to do this because the break may not
> +          * be taken.  If the break is not predicated, we add it anyway so
> +          * that our live intervals computations will operate as if the break
> +          * may or may not be taken.  Consider the following example:
> +          *
> +          *    vec4 color2;
> +          *    while (1) {
> +          *       vec4 color = texture();
> +          *       if (...) {
> +          *          color2 = color * 2;
> +          *          break;
> +          *       }
> +          *    }
> +          *    gl_FragColor = color2;
> +          *
> +          * In this case, the definition of color2 dominates the use because
> +          * the loop only has the one exit.  This means that the live range
> +          * interval for color2 goes from the statement in the if to it's use
> +          * below the loop.  Now suppose that the texture operation has a
> +          * header register that gets assigned one of the registers used for
> +          * color2.  If the loop condition is non-uniform and some of the
> +          * threads will take the break and others will continue.  In this
> +          * case, the next pass through the loop, the WE_all setup of the
> +          * header register will stomp the disabled channels of color2 and
> +          * corrupt the value.
> +          *
> +          * This same problem can occur if you have a mix of 64, 32, and
> +          * 16-bit registers because the channels do not line up or if you
> +          * have a SIMD16 program and the first half of one value overlaps the
> +          * second half of the other.  To solve it, we simply treat the break
> +          * as if it may also continue on because some of the threads may
> +          * continue on.
> +          */
>          next = new_block();
> -        if (inst->predicate)
> -           cur->add_successor(mem_ctx, next);
> +        cur->add_successor(mem_ctx, next);
>
>          set_next_block(&cur, next, ip);
>          break;
> --
> 2.5.0.400.gff86faf
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list