[Mesa-dev] [PATCH v2 5/6] i965/vec4: Adjust spilling cost for consecutive instructions

Iago Toral Quiroga itoral at igalia.com
Tue Jul 28 03:24:50 PDT 2015


Previous patches made it so that we do not need to unspill the same vgrf
with every instruction as long as these instructions come right after
the register was spilled or unspilled. This means that actually spilling
the register is now cheaper in these scenarios, so adjust the spilling
cost calculation accordingly.
---
 .../drivers/dri/i965/brw_vec4_reg_allocate.cpp     | 61 +++++++++++++++++++---
 1 file changed, 55 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
index 9a69fac..652a68c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
@@ -269,6 +269,21 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, bool *no_spill)
 {
    float loop_scale = 1.0;
 
+   /* If a spilled register is written (or unspilled) and then immediately read,
+    * we will avoid to emit scratch reads for that register for as long as
+    * consecutive instructions keep reading that register. This makes spilling
+    * the register cheaper, so we need to account for this here. Since any
+    * instruction can only have at most 3 src operands we can only have up to
+    * 3 cached registers (i.e. registers that we do not need to unspill if they
+    * are read in a consecutive instruction) at any given time.
+    *
+    * We keep two lists, cached[0] has the registers that are cached before
+    * the current instruction, and cached[1] has the registers that are
+    * cached after the current instruction. We copy cached[1] into cached[0]
+    * after processing each instruction.
+    */
+   int cached[2][3] = { -1, -1, -1,   -1, -1, -1 };
+
    for (unsigned i = 0; i < this->alloc.count; i++) {
       spill_costs[i] = 0.0;
       no_spill[i] = alloc.sizes[i] != 1;
@@ -279,18 +294,52 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, bool *no_spill)
     * loops run 10 times.
     */
    foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
+      memset(cached[1], -1, sizeof(cached[1]));
+
       for (unsigned int i = 0; i < 3; i++) {
-	 if (inst->src[i].file == GRF) {
-	    spill_costs[inst->src[i].reg] += loop_scale;
-	    assert(!inst->src[i].reladdr);
-	 }
+         if (inst->src[i].file == GRF) {
+            /* Don't increase spilling cost if src[i] isn't going
+             * to be unspilled
+             */
+            int slot;
+            for (slot = 0; slot < 3; slot++) {
+               if (inst->src[i].reg == cached[0][slot])
+                  break;
+            }
+            if (slot == 3) {
+               /* Not cached, we will unspill src[i] */
+               spill_costs[inst->src[i].reg] += loop_scale;
+               assert(!inst->src[i].reladdr);
+            } else {
+               /* It is cached. Since it is read in this instruction it
+                * stays cached for the next.
+                */
+               cached[1][slot] = inst->src[i].reg;
+            }
+         }
       }
 
       if (inst->dst.file == GRF) {
-	 spill_costs[inst->dst.reg] += loop_scale;
-	 assert(!inst->dst.reladdr);
+         int slot;
+         /* Mark this reg as cached because if the next instruction reads it
+          * we won't unspill it.
+          */
+         for (slot = 0; slot < 3; slot++) {
+            if (cached[1][slot] == -1) {
+               cached[1][slot] = inst->dst.reg;
+               break;
+            }
+         }
+         assert(slot < 3);
+         spill_costs[inst->dst.reg] += loop_scale;
+         assert(!inst->dst.reladdr);
       }
 
+      /* Cached registers after this instruction are the cached registers
+       * for the next instruction
+       */
+      memcpy(cached[0], cached[1], sizeof(cached[0]));
+
       switch (inst->opcode) {
 
       case BRW_OPCODE_DO:
-- 
1.9.1



More information about the mesa-dev mailing list