[Mesa-dev] [PATCH 4/6] i965/fs: Reduce the interference between payload regs and virtual GRFs.
Kenneth Graunke
kenneth at whitecape.org
Fri Oct 12 13:41:15 PDT 2012
On 10/02/2012 07:52 PM, Eric Anholt wrote:
> Improves performance of the Lightsmark penumbra shadows scene by 15.7% +/-
> 1.0% (n=15), by eliminating register spilling. (tested by smashing the list of
> scenes to have all other scenes have 0 duration -- includes additional
> rendering of scene description text that normally doesn't appear in that
> scene)
> ---
> src/mesa/drivers/dri/i965/brw_fs.h | 2 +
> src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 86 +++++++++++++++++----
> 2 files changed, 74 insertions(+), 14 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
> index 34747d3..56c5a27 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -235,6 +235,8 @@ public:
> void assign_urb_setup();
> bool assign_regs();
> void assign_regs_trivial();
> + void setup_payload_interference(struct ra_graph *g, int payload_reg_count,
> + int first_payload_node);
> int choose_spill_reg(struct ra_graph *g);
> void spill_reg(int spill_reg);
> void split_virtual_grfs();
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> index 7b778d6..0510977 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> @@ -164,12 +164,78 @@ brw_alloc_reg_set(struct brw_context *brw, int reg_width, int base_reg_count)
> * Sets up interference between thread payload registers and the virtual GRFs
> * to be allocated for program temporaries.
> */
> -static void
> -brw_setup_payload_interference(struct ra_graph *g,
> - int payload_reg_count,
> - int first_payload_node,
> - int reg_node_count)
> +void
> +fs_visitor::setup_payload_interference(struct ra_graph *g,
> + int payload_reg_count,
> + int first_payload_node)
> {
> + int reg_width = c->dispatch_width / 8;
> + int last_loop_end = 0;
> + int first_loop_start = 0;
> +
> + /* We don't track live intervals for payload regs in our live interval
> + * analysis. Do a really cheesy version in this function: payload regs are
> + * live from the start of the program (always true) until either their last
> + * use, or the end of looping.
> + */
This begs the question: why don't we? My guess is because that works on
VGRFs, and these are real ones.
> + int ip = 0;
> + foreach_list(node, &this->instructions) {
> + fs_inst *inst = (fs_inst *)node;
> +
> + if (inst->opcode == BRW_OPCODE_DO && first_loop_start == 0)
This is OK...I forgot that DO actually appears in the FS IR, even though
it doesn't exist in the actual assembly on Gen6+.
> + first_loop_start = ip;
> + else if (inst->opcode == BRW_OPCODE_WHILE)
> + last_loop_end = ip;
> +
> + ip++;
> + }
> +
> + int payload_use_ip[payload_reg_count];
> + memset(payload_use_ip, 0, sizeof(payload_use_ip));
Maybe call this payload_last_use_ip since it's the /last/ use of the
payload IP?
> + ip = 0;
> + foreach_list(node, &this->instructions) {
> + fs_inst *inst = (fs_inst *)node;
> +
> + /* Note that UNIFORM args have been turned into FIXED_HW_REG by
> + * assign_curbe_setup(), and interpolation uses fixed hardware regs from
> + * the start (see interp_reg()).
> + */
> + for (int i = 0; i < 3; i++) {
> + if (inst->src[i].file == FIXED_HW_REG &&
> + inst->src[i].fixed_hw_reg.file == BRW_GENERAL_REGISTER_FILE) {
> + int reg_nr = inst->src[i].fixed_hw_reg.nr / reg_width;
I'm a little confused by the division by reg_width...the payload regs
don't scale with dispatch size...certainly uniforms don't...
But I suspect that's me being confused, not a bug.
> + if (reg_nr < payload_reg_count) {
> + if (ip < first_loop_start)
> + payload_use_ip[reg_nr] = ip;
> + else
> + payload_use_ip[reg_nr] = MAX2(ip, last_loop_end);
> + }
> + }
> + }
> + ip++;
> + }
> +
> + /* g0/g1 are implied used by the FB_WRITE messages, but not present as regs
> + * in the various instructions. Similarly, other weird payload bits up
> + * until the first push constant might be used and I don't want to think
> + * about them right now.
> + */
> + for (unsigned int i = 0; i < c->nr_payload_regs; i++) {
> + payload_use_ip[i / reg_width] = ip;
> + }
Being conservative here is probably wise. That said, this appears to
never reuse any of the GRFs used for varyings, which kind of sucks. So
we should revisit it in a later patch. g0 and g1 are definitely
magical, but I'm not sure the rest are...
> + for (int i = 0; i < payload_reg_count; i++) {
> + /* Mark the payload reg as interfering with any virtual grf that is live
> + * between the start of the program and our last use of the payload reg.
> + */
> + for (int j = 0; j < this->virtual_grf_count; j++) {
> + if (this->virtual_grf_def[j] <= payload_use_ip[i] ||
> + this->virtual_grf_use[j] <= payload_use_ip[i]) {
> + ra_add_node_interference(g, first_payload_node + i, j);
> + }
> + }
> + }
> +
> for (int i = 0; i < payload_reg_count; i++) {
> /* Mark each payload reg node as being allocated to its physical register.
> *
> @@ -177,13 +243,6 @@ brw_setup_payload_interference(struct ra_graph *g,
> * would just be silly.
> */
> ra_set_node_reg(g, first_payload_node + i, i);
> -
> - /* For now, just mark each payload node as interfering with every other
> - * node to be allocated.
> - */
> - for (int j = 0; j < reg_node_count; j++) {
> - ra_add_node_interference(g, first_payload_node + i, j);
> - }
> }
> }
>
> @@ -240,8 +299,7 @@ fs_visitor::assign_regs()
> }
> }
>
> - brw_setup_payload_interference(g, payload_reg_count, first_payload_node,
> - this->virtual_grf_count);
> + setup_payload_interference(g, payload_reg_count, first_payload_node);
>
> if (!ra_allocate_no_spills(g)) {
> /* Failed to allocate registers. Spill a reg, and the caller will
>
More information about the mesa-dev
mailing list