[Mesa-dev] [PATCH 3/7] i965/fs: Before reg alloc, schedule instructions to reduce live ranges.

Kenneth Graunke kenneth at whitecape.org
Sun Dec 9 14:23:45 PST 2012


On 12/07/2012 02:58 PM, Eric Anholt wrote:
> This came from an idea by Ben Segovia.  16-wide pixel shaders are very
> important for latency hiding on i965, so we want to try really hard to
> get them.  If scheduling an instruction makes some set of instructions
> available, those are probably the ones that make the instruction's
> result dead.  By choosing those first, we'll have a tendency to reduce
> the amount of live data as opposed to creating more.
>
> Previously, we were sometimes getting this behavior out of the
> scheduler, which was what produced the scheduler's original performance
> wins on lightsmark.  Unfortunately, that was mostly an accident of the
> lame instruction latency information that I had, which made it
> impossible to fix the actual scheduling for performance.  Now that we've
> fixed the scheduling for setup for register allocation, we can safely
> update the latency parameters for the final schedule.
>
> In shader-db, we lose 37 16-wide shaders, but gain 90 new ones.  4
> shaders that were spilling change how many registers spill, for a
> reduction of 70/3899 instructions.
> ---
>   .../dri/i965/brw_fs_schedule_instructions.cpp      |   49 +++++++++++++++++---
>   1 file changed, 43 insertions(+), 6 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
> index d48ad1e..3941eac 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
> @@ -496,13 +496,50 @@ instruction_scheduler::schedule_instructions(fs_inst *next_block_header)
>         schedule_node *chosen = NULL;
>         int chosen_time = 0;
>
> -      foreach_list(node, &instructions) {
> -	 schedule_node *n = (schedule_node *)node;
> +      if (post_reg_alloc) {
> +         /* Of the instructions closest ready to execute or the closest to
> +          * being ready, choose the oldest one.
> +          */
> +         foreach_list(node, &instructions) {
> +            schedule_node *n = (schedule_node *)node;
> +
> +            if (!chosen || n->unblocked_time < chosen_time) {
> +               chosen = n;
> +               chosen_time = n->unblocked_time;
> +            }
> +         }
> +      } else {
> +         /* Before register allocation, we don't care about the latencies of
> +          * instructions.  All we care about is reducing live intervals of
> +          * variables so that we can avoid register spilling, or get 16-wide
> +          * shaders which naturally do a better job of hiding instruction
> +          * latency.
> +          *
> +          * To do so, schedule our instructions in a roughly LIFO/depth-first
> +          * order: when new instructions become available as a result of
> +          * scheduling something, choose those first so that our result
> +          * hopefully is consumed quickly.
> +          *
> +          * The exception is messages that generate more than one result
> +          * register (AKA texturing).  In those cases, the LIFO search would
> +          * normally tend to choose them quickly (because scheduling the
> +          * previous message not only unblocked the children using its result,
> +          * but also the MRF setup for the next sampler message, which in turn
> +          * unblocks the next sampler message).
> +          */
> +         for (schedule_node *node = (schedule_node *)instructions.get_tail();
> +              node != instructions.get_head()->prev;
> +              node = (schedule_node *)node->prev) {
> +            schedule_node *n = (schedule_node *)node;
> +
> +            if (!chosen || chosen->inst->regs_written() > 1) {
> +               chosen = n;
> +               if (chosen->inst->regs_written() <= 1)
> +                  break;
> +            }

I don't think the if condition is necessary here.  Just doing

for (...) {
	chosen = (schedule_node *) node;
	if (chosen->inst->regs_written() <= 1)
		break;
}

should accomplish the same thing.

> +         }
>
> -	 if (!chosen || n->unblocked_time < chosen_time) {
> -	    chosen = n;
> -	    chosen_time = n->unblocked_time;
> -	 }
> +         chosen_time = chosen->unblocked_time;

It seems plausible that there could be no nodes to schedule...which 
means chosen would be NULL here.  Perhaps just move chosen_time = 
chosen->unblocked_time into the if...break above.

>         }
>
>         /* Schedule this instruction. */


More information about the mesa-dev mailing list