[Mesa-stable] [Mesa-dev] [PATCH 3/4] intel/cfg: Always add both successors to a break

Jason Ekstrand jason at jlekstrand.net
Tue Oct 10 20:20:23 UTC 2017


On Tue, Oct 10, 2017 at 12:52 PM, Francisco Jerez <currojerez at riseup.net>
wrote:

> Jason Ekstrand <jason at jlekstrand.net> writes:
>
> > On Wed, Oct 4, 2017 at 5:29 PM, Connor Abbott <cwabbott0 at gmail.com>
> wrote:
> >
> >> This won't completely solve the problem. For example, what if you
> >> hoist the assignment to color2 outside the loop?
> >>
> >> vec4 color2;
> >> while (1) {
> >>    vec4 color = texture();
> >>    color2 = color * 2;
> >>    if (...) {
> >>       break;
> >>    }
> >> }
> >> gl_FragColor = color2;
> >>
> >>
> >> Now the definition still dominates the use, even with the modified
> >> control-flow graph, and you have the same problem
> >
> >
> > Curro had me convinced that some detail of the liveness analysis pass
> saved
> > us here but now I can't remember what. :-(
> >
>
> I think it's a shame that my name didn't come up on this discussion
> until it was time to deflect Connor's blame away from you.  Not that I
> generally care a lot about other people putting their names on the
> result of my research while I'm out on vacation, but at least face the
> fact that the patch you put your name on is incomplete, on your own.
>

I'm sorry.  I did not intend to deflect blame there.  I was simply
recalling some discussions we had in which I became convinced (wrongly, as
it turns out) that this was sufficient.  I couldn't quite remember the
details at the time of my reply.  Later, I thought about it more and
realized what I was missing and sent the second patch.  I knew I was
pulling something called WIP off your Jenkins branch and that you had not
in any way shape or form signed off on it as being correct which is why I
didn't put your name on the patch I sent out.


> As it turns out there's a minimal solution for the problem Connor
> brought up that doesn't involve ad-hoc post-dataflow-analysis fix-ups,
> nor disabling dataflow analysis for non-writemask-all assignments, but
> why on earth would I get involved in this discussion now?
>

If you've got a minimum solution, I'd love to hear about it!  I'm just
trying to solve the problem because it's blocking other stuff.  And, to be
honest, my attempts aren't going so well ATM.

--Jason


> >
> >> The real problem is
> >> that the assignment to color2 is really a conditional assignment: if
> >> we're going channel-by-channel, it's not, but if you consider the
> >> *whole* register at the same time, it is. To really fix the problem,
> >> you need to model exactly what the machine actually does: you need to
> >> insert "fake" edges like these, that model the jumps that the machine
> >> can take, and you need to make every assignment a conditional
> >> assignment (i.e. it doesn't kill the register). It's probably not as
> >> bad with Curro's patch on top, though. Also, once you do this you can
> >> make register allocation more accurate by generating interferences
> >> from the liveness information directly instead of from the intervals.
> >>
> >> One thing I've thought about is, in addition to maintaining this
> >> "whole-vector" view of things, is to maintain a "per-channel" liveness
> >> that doesn't use the extra edges, partial definitions etc. and then
> >> use the "per-channel view" to calculate interference when the channels
> >> always line up.
> >>
> >
> > Yes, we've considered that and it's a good idea.  However, I'm trying to
> > fix bugs right now, not write the world's best liveness analysis pass.
> :-)
> >
> >
> >> On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand <jason at jlekstrand.net>
> >> wrote:
> >> > Shader-db results on Sky Lake:
> >> >
> >> >     total instructions in shared programs: 12955125 -> 12953698
> (-0.01%)
> >> >     instructions in affected programs: 55956 -> 54529 (-2.55%)
> >> >     helped: 6
> >> >     HURT: 38
> >> >
> >> > All of the hurt programs were hurt by exactly one instruction because
> >> > this patch affects copy propagation.  Most of the helped instructions
> >> > came from a single orbital explorer shader that was helped by 14.26%
> >> >
> >> > Cc: mesa-stable at lists.freedesktop.org
> >> > ---
> >> >  src/intel/compiler/brw_cfg.cpp | 37 ++++++++++++++++++++++++++++++
> >> +++++--
> >> >  1 file changed, 35 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/src/intel/compiler/brw_cfg.cpp
> >> b/src/intel/compiler/brw_cfg.cpp
> >> > index fad12ee..d8bf725 100644
> >> > --- a/src/intel/compiler/brw_cfg.cpp
> >> > +++ b/src/intel/compiler/brw_cfg.cpp
> >> > @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions)
> >> >           assert(cur_while != NULL);
> >> >          cur->add_successor(mem_ctx, cur_while);
> >> >
> >> > +         /* We also add the next block as a successor of the break.
> If
> >> the
> >> > +          * break is predicated, we need to do this because the break
> >> may not
> >> > +          * be taken.  If the break is not predicated, we add it
> anyway
> >> so
> >> > +          * that our live intervals computations will operate as if
> the
> >> break
> >> > +          * may or may not be taken.  Consider the following example:
> >> > +          *
> >> > +          *    vec4 color2;
> >> > +          *    while (1) {
> >> > +          *       vec4 color = texture();
> >> > +          *       if (...) {
> >> > +          *          color2 = color * 2;
> >> > +          *          break;
> >> > +          *       }
> >> > +          *    }
> >> > +          *    gl_FragColor = color2;
> >> > +          *
> >> > +          * In this case, the definition of color2 dominates the use
> >> because
> >> > +          * the loop only has the one exit.  This means that the live
> >> range
> >> > +          * interval for color2 goes from the statement in the if to
> >> it's use
> >> > +          * below the loop.  Now suppose that the texture operation
> has
> >> a
> >> > +          * header register that gets assigned one of the registers
> >> used for
> >> > +          * color2.  If the loop condition is non-uniform and some of
> >> the
> >> > +          * threads will take the break and others will continue.  In
> >> this
> >> > +          * case, the next pass through the loop, the WE_all setup of
> >> the
> >> > +          * header register will stomp the disabled channels of
> color2
> >> and
> >> > +          * corrupt the value.
> >> > +          *
> >> > +          * This same problem can occur if you have a mix of 64, 32,
> and
> >> > +          * 16-bit registers because the channels do not line up or
> if
> >> you
> >> > +          * have a SIMD16 program and the first half of one value
> >> overlaps the
> >> > +          * second half of the other.  To solve it, we simply treat
> the
> >> break
> >> > +          * as if it may also continue on because some of the threads
> >> may
> >> > +          * continue on.
> >> > +          */
> >> >          next = new_block();
> >> > -        if (inst->predicate)
> >> > -           cur->add_successor(mem_ctx, next);
> >> > +        cur->add_successor(mem_ctx, next);
> >> >
> >> >          set_next_block(&cur, next, ip);
> >> >          break;
> >> > --
> >> > 2.5.0.400.gff86faf
> >> >
> >> > _______________________________________________
> >> > mesa-dev mailing list
> >> > mesa-dev at lists.freedesktop.org
> >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>
> > _______________________________________________
> > mesa-stable mailing list
> > mesa-stable at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-stable
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-stable/attachments/20171010/ccc2e0a0/attachment.html>


More information about the mesa-stable mailing list