[Mesa-dev] [PATCH] i965/fs: Reduce the interference between payload regs and virtual GRFs.
Eric Anholt
eric at anholt.net
Tue Oct 16 18:05:55 PDT 2012
Kenneth Graunke <kenneth at whitecape.org> writes:
> On 10/15/2012 04:06 PM, Eric Anholt wrote:
>> Improves performance of the Lightsmark penumbra shadows scene by 15.7% +/-
>> 1.0% (n=15), by eliminating register spilling. (tested by smashing the list of
>> scenes to have all other scenes have 0 duration -- includes additional
>> rendering of scene description text that normally doesn't appear in that
>> scene)
>> ---
>> src/mesa/drivers/dri/i965/brw_fs.h | 2 +
>> src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 164 ++++++++++++++++++---
>> 2 files changed, 147 insertions(+), 19 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
>> index a71783c..ad717c9 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.h
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
>> @@ -235,6 +235,8 @@ public:
>> void assign_urb_setup();
>> bool assign_regs();
>> void assign_regs_trivial();
>> + void setup_payload_interference(struct ra_graph *g, int payload_reg_count,
>> + int first_payload_node);
>> int choose_spill_reg(struct ra_graph *g);
>> void spill_reg(int spill_reg);
>> void split_virtual_grfs();
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>> index 7b778d6..bd9789f 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>> @@ -163,27 +163,154 @@ brw_alloc_reg_set(struct brw_context *brw, int reg_width, int base_reg_count)
>> /**
>> * Sets up interference between thread payload registers and the virtual GRFs
>> * to be allocated for program temporaries.
>> + *
>> + * We want to be able to reallocate the payload for our virtual GRFs, notably
>> + * because the setup coefficients for a full set of 16 FS inputs takes up 8 of
>> + * our 128 registers.
>> + *
>> + * The layout of the payload registers is:
>> + *
>> + * 0..nr_payload_regs-1: fixed function setup (including bary coordinates).
>> + * nr_payload_regs..nr_payload_regs+curb_read_lengh-1: uniform data
>> + * nr_payload_regs+curb_read_lengh..first_non_payload_grf-1: setup coefficients.
>> + *
>> + * And we have payload_node_count nodes covering these registers in order
>> + * (note that in 16-wide, a node is two registers).
>> */
>> -static void
>> -brw_setup_payload_interference(struct ra_graph *g,
>> - int payload_reg_count,
>> - int first_payload_node,
>> - int reg_node_count)
>> +void
>> +fs_visitor::setup_payload_interference(struct ra_graph *g,
>> + int payload_node_count,
>> + int first_payload_node)
>> {
>> - for (int i = 0; i < payload_reg_count; i++) {
>> - /* Mark each payload reg node as being allocated to its physical register.
>> + int reg_width = c->dispatch_width / 8;
>> + int loop_depth = 0;
>> + int loop_end_ip = 0;
>> +
>> + int payload_last_use_ip[payload_node_count];
>> + memset(payload_last_use_ip, 0, sizeof(payload_last_use_ip));
>> + int ip = 0;
>> + foreach_list(node, &this->instructions) {
>> + fs_inst *inst = (fs_inst *)node;
>> +
>> + switch (inst->opcode) {
>> + case BRW_OPCODE_DO:
>> + loop_depth++;
>> +
>> + /* Since payload regs are deffed only at the start of the shader
>> + * execution, any uses of the payload within a loop mean the live
>> + * interval extends to the end of the outermost loop. Find the ip of
>> + * the end now.
>> + */
>> + if (loop_depth == 1) {
>> + int scan_depth = loop_depth;
>> + int scan_ip = ip;
>> + for (fs_inst *scan_inst = (fs_inst *)inst->next;
>> + scan_depth > 0;
>> + scan_inst = (fs_inst *)scan_inst->next) {
>> + switch (scan_inst->opcode) {
>> + case BRW_OPCODE_DO:
>> + scan_depth++;
>> + break;
>> + case BRW_OPCODE_WHILE:
>> + scan_depth--;
>> + break;
>> + default:
>> + break;
>> + }
>> + scan_ip++;
>> + }
>> + loop_end_ip = scan_ip;
>> + }
>> + break;
>> + case BRW_OPCODE_WHILE:
>> + loop_depth--;
>> + break;
>> + default:
>> + break;
>> + }
>
> Wow, it's unfortunate that you have to do this. Essentially, for each
> instruction in a loop, you walk through all the instructions, to the end
> of the loop. That's big O(fail). :(
Huh? This is "for the top-level loop instruction, count to the end of
that loop".
I mean, we could keep ip in the instructions, but then you get to update
it all over when you inst->remove().
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20121016/f740a1a7/attachment.pgp>
More information about the mesa-dev
mailing list