[Mesa-dev] [PATCH 3/4] intel/cfg: Always add both successors to a break
Connor Abbott
cwabbott0 at gmail.com
Thu Oct 5 00:37:22 UTC 2017
On Wed, Oct 4, 2017 at 8:35 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Wed, Oct 4, 2017 at 5:29 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>
>> This won't completely solve the problem. For example, what if you
>> hoist the assignment to color2 outside the loop?
>>
>> vec4 color2;
>> while (1) {
>> vec4 color = texture();
>> color2 = color * 2;
>> if (...) {
>> break;
>> }
>> }
>> gl_FragColor = color2;
>>
>>
>> Now the definition still dominates the use, even with the modified
>> control-flow graph, and you have the same problem
>
>
> Curro had me convinced that some detail of the liveness analysis pass saved
> us here but now I can't remember what. :-(
>
>>
>> The real problem is
>> that the assignment to color2 is really a conditional assignment: if
>> we're going channel-by-channel, it's not, but if you consider the
>> *whole* register at the same time, it is. To really fix the problem,
>> you need to model exactly what the machine actually does: you need to
>> insert "fake" edges like these, that model the jumps that the machine
>> can take, and you need to make every assignment a conditional
>> assignment (i.e. it doesn't kill the register). It's probably not as
>> bad with Curro's patch on top, though. Also, once you do this you can
>> make register allocation more accurate by generating interferences
>> from the liveness information directly instead of from the intervals.
>>
>> One thing I've thought about is, in addition to maintaining this
>> "whole-vector" view of things, is to maintain a "per-channel" liveness
>> that doesn't use the extra edges, partial definitions etc. and then
>> use the "per-channel view" to calculate interference when the channels
>> always line up.
>
>
> Yes, we've considered that and it's a good idea. However, I'm trying to fix
> bugs right now, not write the world's best liveness analysis pass. :-)
That's fair, although just implementing the first bit shouldn't be too hard.
>
>>
>> On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand <jason at jlekstrand.net>
>> wrote:
>> > Shader-db results on Sky Lake:
>> >
>> > total instructions in shared programs: 12955125 -> 12953698 (-0.01%)
>> > instructions in affected programs: 55956 -> 54529 (-2.55%)
>> > helped: 6
>> > HURT: 38
>> >
>> > All of the hurt programs were hurt by exactly one instruction because
>> > this patch affects copy propagation. Most of the helped instructions
>> > came from a single orbital explorer shader that was helped by 14.26%
>> >
>> > Cc: mesa-stable at lists.freedesktop.org
>> > ---
>> > src/intel/compiler/brw_cfg.cpp | 37
>> > +++++++++++++++++++++++++++++++++++--
>> > 1 file changed, 35 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/src/intel/compiler/brw_cfg.cpp
>> > b/src/intel/compiler/brw_cfg.cpp
>> > index fad12ee..d8bf725 100644
>> > --- a/src/intel/compiler/brw_cfg.cpp
>> > +++ b/src/intel/compiler/brw_cfg.cpp
>> > @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions)
>> > assert(cur_while != NULL);
>> > cur->add_successor(mem_ctx, cur_while);
>> >
>> > + /* We also add the next block as a successor of the break. If
>> > the
>> > + * break is predicated, we need to do this because the break
>> > may not
>> > + * be taken. If the break is not predicated, we add it anyway
>> > so
>> > + * that our live intervals computations will operate as if the
>> > break
>> > + * may or may not be taken. Consider the following example:
>> > + *
>> > + * vec4 color2;
>> > + * while (1) {
>> > + * vec4 color = texture();
>> > + * if (...) {
>> > + * color2 = color * 2;
>> > + * break;
>> > + * }
>> > + * }
>> > + * gl_FragColor = color2;
>> > + *
>> > + * In this case, the definition of color2 dominates the use
>> > because
>> > + * the loop only has the one exit. This means that the live
>> > range
>> > + * interval for color2 goes from the statement in the if to
>> > it's use
>> > + * below the loop. Now suppose that the texture operation has
>> > a
>> > + * header register that gets assigned one of the registers
>> > used for
>> > + * color2. If the loop condition is non-uniform and some of
>> > the
>> > + * threads will take the break and others will continue. In
>> > this
>> > + * case, the next pass through the loop, the WE_all setup of
>> > the
>> > + * header register will stomp the disabled channels of color2
>> > and
>> > + * corrupt the value.
>> > + *
>> > + * This same problem can occur if you have a mix of 64, 32,
>> > and
>> > + * 16-bit registers because the channels do not line up or if
>> > you
>> > + * have a SIMD16 program and the first half of one value
>> > overlaps the
>> > + * second half of the other. To solve it, we simply treat the
>> > break
>> > + * as if it may also continue on because some of the threads
>> > may
>> > + * continue on.
>> > + */
>> > next = new_block();
>> > - if (inst->predicate)
>> > - cur->add_successor(mem_ctx, next);
>> > + cur->add_successor(mem_ctx, next);
>> >
>> > set_next_block(&cur, next, ip);
>> > break;
>> > --
>> > 2.5.0.400.gff86faf
>> >
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev at lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
More information about the mesa-dev
mailing list