[Mesa-dev] [PATCH 3/4] intel/cfg: Always add both successors to a break
Jason Ekstrand
jason at jlekstrand.net
Thu Oct 5 03:23:48 UTC 2017
New patch on the list.
On Wed, Oct 4, 2017 at 7:42 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Wed, Oct 4, 2017 at 5:35 PM, Jason Ekstrand <jason at jlekstrand.net>
> wrote:
>
>> On Wed, Oct 4, 2017 at 5:29 PM, Connor Abbott <cwabbott0 at gmail.com>
>> wrote:
>>
>>> This won't completely solve the problem. For example, what if you
>>> hoist the assignment to color2 outside the loop?
>>>
>>> vec4 color2;
>>> while (1) {
>>> vec4 color = texture();
>>> color2 = color * 2;
>>> if (...) {
>>> break;
>>> }
>>> }
>>> gl_FragColor = color2;
>>>
>>>
>>> Now the definition still dominates the use, even with the modified
>>> control-flow graph, and you have the same problem
>>
>>
>> Curro had me convinced that some detail of the liveness analysis pass
>> saved us here but now I can't remember what. :-(
>>
>>
>>> The real problem is
>>> that the assignment to color2 is really a conditional assignment: if
>>> we're going channel-by-channel, it's not, but if you consider the
>>> *whole* register at the same time, it is. To really fix the problem,
>>> you need to model exactly what the machine actually does: you need to
>>> insert "fake" edges like these, that model the jumps that the machine
>>> can take, and you need to make every assignment a conditional
>>> assignment (i.e. it doesn't kill the register). It's probably not as
>>> bad with Curro's patch on top, though. Also, once you do this you can
>>> make register allocation more accurate by generating interferences
>>> from the liveness information directly instead of from the intervals.
>>>
>>> One thing I've thought about is, in addition to maintaining this
>>> "whole-vector" view of things, is to maintain a "per-channel" liveness
>>> that doesn't use the extra edges, partial definitions etc. and then
>>> use the "per-channel view" to calculate interference when the channels
>>> always line up.
>>>
>>
>> Yes, we've considered that and it's a good idea. However, I'm trying to
>> fix bugs right now, not write the world's best liveness analysis pass. :-)
>>
>
> You're correct, as usual... I've inspected the result of liveness anlaysis
> and we do indeed get it wrong. I'll come up with something less bogus.
>
>
>> On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand <jason at jlekstrand.net>
>>> wrote:
>>> > Shader-db results on Sky Lake:
>>> >
>>> > total instructions in shared programs: 12955125 -> 12953698
>>> (-0.01%)
>>> > instructions in affected programs: 55956 -> 54529 (-2.55%)
>>> > helped: 6
>>> > HURT: 38
>>> >
>>> > All of the hurt programs were hurt by exactly one instruction because
>>> > this patch affects copy propagation. Most of the helped instructions
>>> > came from a single orbital explorer shader that was helped by 14.26%
>>> >
>>> > Cc: mesa-stable at lists.freedesktop.org
>>> > ---
>>> > src/intel/compiler/brw_cfg.cpp | 37 ++++++++++++++++++++++++++++++
>>> +++++--
>>> > 1 file changed, 35 insertions(+), 2 deletions(-)
>>> >
>>> > diff --git a/src/intel/compiler/brw_cfg.cpp
>>> b/src/intel/compiler/brw_cfg.cpp
>>> > index fad12ee..d8bf725 100644
>>> > --- a/src/intel/compiler/brw_cfg.cpp
>>> > +++ b/src/intel/compiler/brw_cfg.cpp
>>> > @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions)
>>> > assert(cur_while != NULL);
>>> > cur->add_successor(mem_ctx, cur_while);
>>> >
>>> > + /* We also add the next block as a successor of the break.
>>> If the
>>> > + * break is predicated, we need to do this because the break
>>> may not
>>> > + * be taken. If the break is not predicated, we add it
>>> anyway so
>>> > + * that our live intervals computations will operate as if
>>> the break
>>> > + * may or may not be taken. Consider the following example:
>>> > + *
>>> > + * vec4 color2;
>>> > + * while (1) {
>>> > + * vec4 color = texture();
>>> > + * if (...) {
>>> > + * color2 = color * 2;
>>> > + * break;
>>> > + * }
>>> > + * }
>>> > + * gl_FragColor = color2;
>>> > + *
>>> > + * In this case, the definition of color2 dominates the use
>>> because
>>> > + * the loop only has the one exit. This means that the live
>>> range
>>> > + * interval for color2 goes from the statement in the if to
>>> it's use
>>> > + * below the loop. Now suppose that the texture operation
>>> has a
>>> > + * header register that gets assigned one of the registers
>>> used for
>>> > + * color2. If the loop condition is non-uniform and some of
>>> the
>>> > + * threads will take the break and others will continue. In
>>> this
>>> > + * case, the next pass through the loop, the WE_all setup of
>>> the
>>> > + * header register will stomp the disabled channels of
>>> color2 and
>>> > + * corrupt the value.
>>> > + *
>>> > + * This same problem can occur if you have a mix of 64, 32,
>>> and
>>> > + * 16-bit registers because the channels do not line up or
>>> if you
>>> > + * have a SIMD16 program and the first half of one value
>>> overlaps the
>>> > + * second half of the other. To solve it, we simply treat
>>> the break
>>> > + * as if it may also continue on because some of the threads
>>> may
>>> > + * continue on.
>>> > + */
>>> > next = new_block();
>>> > - if (inst->predicate)
>>> > - cur->add_successor(mem_ctx, next);
>>> > + cur->add_successor(mem_ctx, next);
>>> >
>>> > set_next_block(&cur, next, ip);
>>> > break;
>>> > --
>>> > 2.5.0.400.gff86faf
>>> >
>>> > _______________________________________________
>>> > mesa-dev mailing list
>>> > mesa-dev at lists.freedesktop.org
>>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20171004/387f4063/attachment.html>
More information about the mesa-dev
mailing list