[Mesa-dev] [PATCH] i965/fs: Reduce the interference between payload regs and virtual GRFs.

Tue Oct 16 18:05:55 PDT 2012

Kenneth Graunke <kenneth at whitecape.org> writes:

> On 10/15/2012 04:06 PM, Eric Anholt wrote:
>> Improves performance of the Lightsmark penumbra shadows scene by 15.7% +/-
>> 1.0% (n=15), by eliminating register spilling. (tested by smashing the list of
>> scenes to have all other scenes have 0 duration -- includes additional
>> rendering of scene description text that normally doesn't appear in that
>> scene)
>> ---
>>   src/mesa/drivers/dri/i965/brw_fs.h                |    2 +
>>   src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp |  164 ++++++++++++++++++---
>>   2 files changed, 147 insertions(+), 19 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
>> index a71783c..ad717c9 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.h
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
>> @@ -235,6 +235,8 @@ public:
>>      void assign_urb_setup();
>>      bool assign_regs();
>>      void assign_regs_trivial();
>> +   void setup_payload_interference(struct ra_graph *g, int payload_reg_count,
>> +                                   int first_payload_node);
>>      int choose_spill_reg(struct ra_graph *g);
>>      void spill_reg(int spill_reg);
>>      void split_virtual_grfs();
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>> index 7b778d6..bd9789f 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>> @@ -163,27 +163,154 @@ brw_alloc_reg_set(struct brw_context *brw, int reg_width, int base_reg_count)
>>   /**
>>    * Sets up interference between thread payload registers and the virtual GRFs
>>    * to be allocated for program temporaries.
>> + *
>> + * We want to be able to reallocate the payload for our virtual GRFs, notably
>> + * because the setup coefficients for a full set of 16 FS inputs takes up 8 of
>> + * our 128 registers.
>> + *
>> + * The layout of the payload registers is:
>> + *
>> + * 0..nr_payload_regs-1: fixed function setup (including bary coordinates).
>> + * nr_payload_regs..nr_payload_regs+curb_read_lengh-1: uniform data
>> + * nr_payload_regs+curb_read_lengh..first_non_payload_grf-1: setup coefficients.
>> + *
>> + * And we have payload_node_count nodes covering these registers in order
>> + * (note that in 16-wide, a node is two registers).
>>    */
>> -static void
>> -brw_setup_payload_interference(struct ra_graph *g,
>> -                               int payload_reg_count,
>> -                               int first_payload_node,
>> -                               int reg_node_count)
>> +void
>> +fs_visitor::setup_payload_interference(struct ra_graph *g,
>> +                                       int payload_node_count,
>> +                                       int first_payload_node)
>>   {
>> -   for (int i = 0; i < payload_reg_count; i++) {
>> -      /* Mark each payload reg node as being allocated to its physical register.
>> +   int reg_width = c->dispatch_width / 8;
>> +   int loop_depth = 0;
>> +   int loop_end_ip = 0;
>> +
>> +   int payload_last_use_ip[payload_node_count];
>> +   memset(payload_last_use_ip, 0, sizeof(payload_last_use_ip));
>> +   int ip = 0;
>> +   foreach_list(node, &this->instructions) {
>> +      fs_inst *inst = (fs_inst *)node;
>> +
>> +      switch (inst->opcode) {
>> +      case BRW_OPCODE_DO:
>> +         loop_depth++;
>> +
>> +         /* Since payload regs are deffed only at the start of the shader
>> +          * execution, any uses of the payload within a loop mean the live
>> +          * interval extends to the end of the outermost loop.  Find the ip of
>> +          * the end now.
>> +          */
>> +         if (loop_depth == 1) {
>> +            int scan_depth = loop_depth;
>> +            int scan_ip = ip;
>> +            for (fs_inst *scan_inst = (fs_inst *)inst->next;
>> +                 scan_depth > 0;
>> +                 scan_inst = (fs_inst *)scan_inst->next) {
>> +               switch (scan_inst->opcode) {
>> +               case BRW_OPCODE_DO:
>> +                  scan_depth++;
>> +                  break;
>> +               case BRW_OPCODE_WHILE:
>> +                  scan_depth--;
>> +                  break;
>> +               default:
>> +                  break;
>> +               }
>> +               scan_ip++;
>> +            }
>> +            loop_end_ip = scan_ip;
>> +         }
>> +         break;
>> +      case BRW_OPCODE_WHILE:
>> +         loop_depth--;
>> +         break;
>> +      default:
>> +         break;
>> +      }
>
> Wow, it's unfortunate that you have to do this.  Essentially, for each 
> instruction in a loop, you walk through all the instructions, to the end 
> of the loop.  That's big O(fail). :(

Huh?  This is "for the top-level loop instruction, count to the end of
that loop".

I mean, we could keep ip in the instructions, but then you get to update
it all over when you inst->remove().
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20121016/f740a1a7/attachment.pgp>