[Mesa-dev] [PATCH v2 5/6] i965/vec4: Adjust spilling cost for consecutive instructions
Francisco Jerez
currojerez at riseup.net
Thu Jul 30 06:39:59 PDT 2015
Francisco Jerez <currojerez at riseup.net> writes:
> Iago Toral Quiroga <itoral at igalia.com> writes:
>
>> Previous patches made it so that we do not need to unspill the same vgrf
>> with every instruction as long as these instructions come right after
>> the register was spilled or unspilled. This means that actually spilling
>> the register is now cheaper in these scenarios, so adjust the spilling
>> cost calculation accordingly.
>> ---
>> .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 61 +++++++++++++++++++---
>> 1 file changed, 55 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>> index 9a69fac..652a68c 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>> @@ -269,6 +269,21 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, bool *no_spill)
>> {
>> float loop_scale = 1.0;
>>
>> + /* If a spilled register is written (or unspilled) and then immediately read,
>> + * we will avoid to emit scratch reads for that register for as long as
>> + * consecutive instructions keep reading that register. This makes spilling
>> + * the register cheaper, so we need to account for this here. Since any
>> + * instruction can only have at most 3 src operands we can only have up to
>> + * 3 cached registers (i.e. registers that we do not need to unspill if they
>> + * are read in a consecutive instruction) at any given time.
>> + *
>> + * We keep two lists, cached[0] has the registers that are cached before
>> + * the current instruction, and cached[1] has the registers that are
>> + * cached after the current instruction. We copy cached[1] into cached[0]
>> + * after processing each instruction.
>> + */
>> + int cached[2][3] = { -1, -1, -1, -1, -1, -1 };
>> +
>
> Wouldn't it be easier to drop the array and just compare the sources of
> each instruction with the previous one?
>
>> for (unsigned i = 0; i < this->alloc.count; i++) {
>> spill_costs[i] = 0.0;
>> no_spill[i] = alloc.sizes[i] != 1;
>> @@ -279,18 +294,52 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, bool *no_spill)
>> * loops run 10 times.
>> */
>> foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
>> + memset(cached[1], -1, sizeof(cached[1]));
>> +
>> for (unsigned int i = 0; i < 3; i++) {
>> - if (inst->src[i].file == GRF) {
>> - spill_costs[inst->src[i].reg] += loop_scale;
>> - assert(!inst->src[i].reladdr);
>> - }
>> + if (inst->src[i].file == GRF) {
>> + /* Don't increase spilling cost if src[i] isn't going
>> + * to be unspilled
>> + */
>> + int slot;
>> + for (slot = 0; slot < 3; slot++) {
>> + if (inst->src[i].reg == cached[0][slot])
>
> I guess this should take the reg_offset and writemask of the cached
> register into account. I suggest you use backend_reg::in_range() to
> check if there is overlap.
>
Er, it's of course not necessary to check reg_offset because it's
guaranteed to be zero, but checking writemask is AFAICT -- Using
backend_reg::in_range() may improve readability either way.
>> + break;
>> + }
>> + if (slot == 3) {
>> + /* Not cached, we will unspill src[i] */
>> + spill_costs[inst->src[i].reg] += loop_scale;
>> + assert(!inst->src[i].reladdr);
>> + } else {
>> + /* It is cached. Since it is read in this instruction it
>> + * stays cached for the next.
>> + */
>> + cached[1][slot] = inst->src[i].reg;
>> + }
>> + }
>> }
>>
>> if (inst->dst.file == GRF) {
>> - spill_costs[inst->dst.reg] += loop_scale;
>> - assert(!inst->dst.reladdr);
>> + int slot;
>> + /* Mark this reg as cached because if the next instruction reads it
>> + * we won't unspill it.
>> + */
>> + for (slot = 0; slot < 3; slot++) {
>> + if (cached[1][slot] == -1) {
>> + cached[1][slot] = inst->dst.reg;
>> + break;
>> + }
>> + }
>> + assert(slot < 3);
>> + spill_costs[inst->dst.reg] += loop_scale;
>> + assert(!inst->dst.reladdr);
>> }
>>
>> + /* Cached registers after this instruction are the cached registers
>> + * for the next instruction
>> + */
>> + memcpy(cached[0], cached[1], sizeof(cached[0]));
>> +
>> switch (inst->opcode) {
>>
>> case BRW_OPCODE_DO:
>> --
>> 1.9.1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150730/f86aa776/attachment-0001.sig>
More information about the mesa-dev
mailing list