[Mesa-dev] [PATCH 3/4] intel/cfg: Always add both successors to a break

Jason Ekstrand jason at jlekstrand.net
Thu Oct 5 00:35:54 UTC 2017


On Wed, Oct 4, 2017 at 5:29 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:

> This won't completely solve the problem. For example, what if you
> hoist the assignment to color2 outside the loop?
>
> vec4 color2;
> while (1) {
>    vec4 color = texture();
>    color2 = color * 2;
>    if (...) {
>       break;
>    }
> }
> gl_FragColor = color2;
>
>
> Now the definition still dominates the use, even with the modified
> control-flow graph, and you have the same problem


Curro had me convinced that some detail of the liveness analysis pass saved
us here but now I can't remember what. :-(


> The real problem is
> that the assignment to color2 is really a conditional assignment: if
> we're going channel-by-channel, it's not, but if you consider the
> *whole* register at the same time, it is. To really fix the problem,
> you need to model exactly what the machine actually does: you need to
> insert "fake" edges like these, that model the jumps that the machine
> can take, and you need to make every assignment a conditional
> assignment (i.e. it doesn't kill the register). It's probably not as
> bad with Curro's patch on top, though. Also, once you do this you can
> make register allocation more accurate by generating interferences
> from the liveness information directly instead of from the intervals.
>
> One thing I've thought about is, in addition to maintaining this
> "whole-vector" view of things, is to maintain a "per-channel" liveness
> that doesn't use the extra edges, partial definitions etc. and then
> use the "per-channel view" to calculate interference when the channels
> always line up.
>

Yes, we've considered that and it's a good idea.  However, I'm trying to
fix bugs right now, not write the world's best liveness analysis pass. :-)


> On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand <jason at jlekstrand.net>
> wrote:
> > Shader-db results on Sky Lake:
> >
> >     total instructions in shared programs: 12955125 -> 12953698 (-0.01%)
> >     instructions in affected programs: 55956 -> 54529 (-2.55%)
> >     helped: 6
> >     HURT: 38
> >
> > All of the hurt programs were hurt by exactly one instruction because
> > this patch affects copy propagation.  Most of the helped instructions
> > came from a single orbital explorer shader that was helped by 14.26%
> >
> > Cc: mesa-stable at lists.freedesktop.org
> > ---
> >  src/intel/compiler/brw_cfg.cpp | 37 ++++++++++++++++++++++++++++++
> +++++--
> >  1 file changed, 35 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/intel/compiler/brw_cfg.cpp
> b/src/intel/compiler/brw_cfg.cpp
> > index fad12ee..d8bf725 100644
> > --- a/src/intel/compiler/brw_cfg.cpp
> > +++ b/src/intel/compiler/brw_cfg.cpp
> > @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions)
> >           assert(cur_while != NULL);
> >          cur->add_successor(mem_ctx, cur_while);
> >
> > +         /* We also add the next block as a successor of the break.  If
> the
> > +          * break is predicated, we need to do this because the break
> may not
> > +          * be taken.  If the break is not predicated, we add it anyway
> so
> > +          * that our live intervals computations will operate as if the
> break
> > +          * may or may not be taken.  Consider the following example:
> > +          *
> > +          *    vec4 color2;
> > +          *    while (1) {
> > +          *       vec4 color = texture();
> > +          *       if (...) {
> > +          *          color2 = color * 2;
> > +          *          break;
> > +          *       }
> > +          *    }
> > +          *    gl_FragColor = color2;
> > +          *
> > +          * In this case, the definition of color2 dominates the use
> because
> > +          * the loop only has the one exit.  This means that the live
> range
> > +          * interval for color2 goes from the statement in the if to
> it's use
> > +          * below the loop.  Now suppose that the texture operation has
> a
> > +          * header register that gets assigned one of the registers
> used for
> > +          * color2.  If the loop condition is non-uniform and some of
> the
> > +          * threads will take the break and others will continue.  In
> this
> > +          * case, the next pass through the loop, the WE_all setup of
> the
> > +          * header register will stomp the disabled channels of color2
> and
> > +          * corrupt the value.
> > +          *
> > +          * This same problem can occur if you have a mix of 64, 32, and
> > +          * 16-bit registers because the channels do not line up or if
> you
> > +          * have a SIMD16 program and the first half of one value
> overlaps the
> > +          * second half of the other.  To solve it, we simply treat the
> break
> > +          * as if it may also continue on because some of the threads
> may
> > +          * continue on.
> > +          */
> >          next = new_block();
> > -        if (inst->predicate)
> > -           cur->add_successor(mem_ctx, next);
> > +        cur->add_successor(mem_ctx, next);
> >
> >          set_next_block(&cur, next, ip);
> >          break;
> > --
> > 2.5.0.400.gff86faf
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20171004/fa77dab4/attachment-0001.html>


More information about the mesa-dev mailing list