[Mesa-dev] [PATCHv2] ra: Disable round-robin strategy for optimistically colorable nodes.

Tue Feb 17 10:24:40 PST 2015

On Tue, Feb 17, 2015 at 04:41:41PM +0200, Francisco Jerez wrote:
> Tom Stellard <tom at stellard.net> writes:
> 
> > On Tue, Feb 17, 2015 at 03:23:05PM +0200, Francisco Jerez wrote:
> >> The round-robin allocation strategy is expected to decrease the amount
> >> of false dependencies created by the register allocator and give the
> >> post-RA scheduling pass more freedom to move instructions around.  On
> >> the other hand it has the disadvantage of increasing fragmentation and
> >> decreasing the number of equally-colored nearby nodes, what increases
> >> the likelihood of failure in presence of optimistically colorable
> >> nodes.
> >> 
> >> This patch disables the round-robin strategy for optimistically
> >> colorable nodes.  These typically arise in situations of high register
> >> pressure or for registers with large live intervals, in both cases the
> >> task of the instruction scheduler shouldn't be constrained excessively
> >> by the dense packing of those nodes, and a spill (or on Intel hardware
> >> a fall-back to SIMD8 mode) is invariably worse than a slightly less
> >> optimal scheduling.
> >> 
> >
> Hi Tom,
> 
> > I'm trying to figure out how this will affect r300g, and it seems like
> > from your description that it will be an improvement, because r300g
> > doesn't have a post-ra scheduler and it also can't spill registers.
> >
> > What do you think?
> >
> 
> It looks like it won't, apparently i965 is the only caller of
> ra_set_allocate_round_robin() in the tree right now, so it should be the
> only affected back-end.  You could consider enabling it to reduce the
> number false dependencies introduced by the register allocator -- after
> this patch it shouldn't lead to increased likelihood of register
> allocation failure anymore.  It might however lead to increased register
> usage possibly limiting the number of threads your hardware can run in
> parallel, the answer really depends on whether that's a limiting factor
> for your hardware or not.  I guess that if you don't have a post-RA
> scheduling pass the benefit you could possibly get from it is rather
> limited, it's probably safe to assume that you don't need it but it
> might be worth looking into.
> 

Ok, thanks for the explanation.  I probably won't have time to
investigate, but it's good knowing this is patch is a no-op for
r300g so I don't need to worry about regressions.

-Tom

> > -Tom
> >
> >
> >> Shader-db results on the i965 driver:
> >> 
> >> total instructions in shared programs: 5488539 -> 5488489 (-0.00%)
> >> instructions in affected programs:     1121 -> 1071 (-4.46%)
> >> helped:                                1
> >> HURT:                                  0
> >> GAINED:                                49
> >> LOST:                                  5
> >> 
> >> v2: Re-enable round-robin already for the lowest one of the nodes
> >>     pushed optimistically onto the sack (Connor).
> >> ---
> >>  src/util/register_allocate.c | 23 ++++++++++++++++++++++-
> >>  1 file changed, 22 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/src/util/register_allocate.c b/src/util/register_allocate.c
> >> index af7a20c..b1ed273 100644
> >> --- a/src/util/register_allocate.c
> >> +++ b/src/util/register_allocate.c
> >> @@ -168,6 +168,12 @@ struct ra_graph {
> >>  
> >>     unsigned int *stack;
> >>     unsigned int stack_count;
> >> +
> >> +   /**
> >> +    * Tracks the start of the set of optimistically-colored registers in the
> >> +    * stack.
> >> +    */
> >> +   unsigned int stack_optimistic_start;
> >>  };
> >>  
> >>  /**
> >> @@ -454,6 +460,7 @@ static void
> >>  ra_simplify(struct ra_graph *g)
> >>  {
> >>     bool progress = true;
> >> +   unsigned int stack_optimistic_start = ~0;
> >>     int i;
> >>  
> >>     while (progress) {
> >> @@ -483,12 +490,16 @@ ra_simplify(struct ra_graph *g)
> >>  
> >>        if (!progress && best_optimistic_node != ~0U) {
> >>  	 decrement_q(g, best_optimistic_node);
> >> +         stack_optimistic_start =
> >> +            MIN2(stack_optimistic_start, g->stack_count);
> >>  	 g->stack[g->stack_count] = best_optimistic_node;
> >>  	 g->stack_count++;
> >>  	 g->nodes[best_optimistic_node].in_stack = true;
> >>  	 progress = true;
> >>        }
> >>     }
> >> +
> >> +   g->stack_optimistic_start = stack_optimistic_start;
> >>  }
> >>  
> >>  /**
> >> @@ -542,7 +553,17 @@ ra_select(struct ra_graph *g)
> >>        g->nodes[n].reg = r;
> >>        g->stack_count--;
> >>  
> >> -      if (g->regs->round_robin)
> >> +      /* Rotate the starting point except for any nodes above the lowest
> >> +       * optimistically colorable node.  The likelihood that we will succeed
> >> +       * at allocating optimistically colorable nodes is highly dependent on
> >> +       * the way that the previous nodes popped off the stack are laid out.
> >> +       * The round-robin strategy increases the fragmentation of the register
> >> +       * file and decreases the number of nearby nodes assigned to the same
> >> +       * color, what increases the likelihood of spilling with respect to the
> >> +       * dense packing strategy.
> >> +       */
> >> +      if (g->regs->round_robin &&
> >> +          g->stack_count <= g->stack_optimistic_start + 1)
> >>           start_search_reg = r + 1;
> >>     }
> >>  
> >> -- 
> >> 2.1.3
> >> 
> >> _______________________________________________
> >> mesa-dev mailing list
> >> mesa-dev at lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev