[Mesa-dev] [PATCH v2 5/6] i965/vec4: Adjust spilling cost for consecutive instructions
Iago Toral
itoral at igalia.com
Thu Jul 30 23:35:00 PDT 2015
On Thu, 2015-07-30 at 16:33 +0300, Francisco Jerez wrote:
> Francisco Jerez <currojerez at riseup.net> writes:
>
> > Iago Toral Quiroga <itoral at igalia.com> writes:
> >
> >> Previous patches made it so that we do not need to unspill the same vgrf
> >> with every instruction as long as these instructions come right after
> >> the register was spilled or unspilled. This means that actually spilling
> >> the register is now cheaper in these scenarios, so adjust the spilling
> >> cost calculation accordingly.
> >> ---
> >> .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 61 +++++++++++++++++++---
> >> 1 file changed, 55 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> >> index 9a69fac..652a68c 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> >> @@ -269,6 +269,21 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, bool *no_spill)
> >> {
> >> float loop_scale = 1.0;
> >>
> >> + /* If a spilled register is written (or unspilled) and then immediately read,
> >> + * we will avoid to emit scratch reads for that register for as long as
> >> + * consecutive instructions keep reading that register. This makes spilling
> >> + * the register cheaper, so we need to account for this here. Since any
> >> + * instruction can only have at most 3 src operands we can only have up to
> >> + * 3 cached registers (i.e. registers that we do not need to unspill if they
> >> + * are read in a consecutive instruction) at any given time.
> >> + *
> >> + * We keep two lists, cached[0] has the registers that are cached before
> >> + * the current instruction, and cached[1] has the registers that are
> >> + * cached after the current instruction. We copy cached[1] into cached[0]
> >> + * after processing each instruction.
> >> + */
> >> + int cached[2][3] = { -1, -1, -1, -1, -1, -1 };
> >> +
> >
> > Wouldn't it be easier to drop the array and just compare the sources of
> > each instruction with the previous one?
> >
> >> for (unsigned i = 0; i < this->alloc.count; i++) {
> >> spill_costs[i] = 0.0;
> >> no_spill[i] = alloc.sizes[i] != 1;
> >> @@ -279,18 +294,52 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, bool *no_spill)
> >> * loops run 10 times.
> >> */
> >> foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
> >> + memset(cached[1], -1, sizeof(cached[1]));
> >> +
> >> for (unsigned int i = 0; i < 3; i++) {
> >> - if (inst->src[i].file == GRF) {
> >> - spill_costs[inst->src[i].reg] += loop_scale;
> >> - assert(!inst->src[i].reladdr);
> >> - }
> >> + if (inst->src[i].file == GRF) {
> >> + /* Don't increase spilling cost if src[i] isn't going
> >> + * to be unspilled
> >> + */
> >> + int slot;
> >> + for (slot = 0; slot < 3; slot++) {
> >> + if (inst->src[i].reg == cached[0][slot])
> >
> > I guess this should take the reg_offset and writemask of the cached
> > register into account. I suggest you use backend_reg::in_range() to
> > check if there is overlap.
> >
>
> It also doesn't look like this handles the case in which the same
> register is unspilled several times for the same instruction correctly,
> as you've implemented it in PATCH 1.
True, I'll fix that and have a go at your other suggestions as well.
Iago
> >> + break;
> >> + }
> >> + if (slot == 3) {
> >> + /* Not cached, we will unspill src[i] */
> >> + spill_costs[inst->src[i].reg] += loop_scale;
> >> + assert(!inst->src[i].reladdr);
> >> + } else {
> >> + /* It is cached. Since it is read in this instruction it
> >> + * stays cached for the next.
> >> + */
> >> + cached[1][slot] = inst->src[i].reg;
> >> + }
> >> + }
> >> }
> >>
> >> if (inst->dst.file == GRF) {
> >> - spill_costs[inst->dst.reg] += loop_scale;
> >> - assert(!inst->dst.reladdr);
> >> + int slot;
> >> + /* Mark this reg as cached because if the next instruction reads it
> >> + * we won't unspill it.
> >> + */
> >> + for (slot = 0; slot < 3; slot++) {
> >> + if (cached[1][slot] == -1) {
> >> + cached[1][slot] = inst->dst.reg;
> >> + break;
> >> + }
> >> + }
> >> + assert(slot < 3);
> >> + spill_costs[inst->dst.reg] += loop_scale;
> >> + assert(!inst->dst.reladdr);
> >> }
> >>
> >> + /* Cached registers after this instruction are the cached registers
> >> + * for the next instruction
> >> + */
> >> + memcpy(cached[0], cached[1], sizeof(cached[0]));
> >> +
> >> switch (inst->opcode) {
> >>
> >> case BRW_OPCODE_DO:
> >> --
> >> 1.9.1
More information about the mesa-dev
mailing list