[Mesa-dev] [RFC PATCH 1/5] i965: Add writes_accumulator flag

Juha-Pekka Heikkilä juhapekka.heikkila at gmail.com
Mon Mar 24 07:45:36 PDT 2014


I think I would need some hints here, I don't know how to test my
changes now properly.

On Sat, Mar 22, 2014 at 8:26 AM, Matt Turner <mattst88 at gmail.com> wrote:
> On Fri, Mar 21, 2014 at 11:17 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> On Fri, Mar 21, 2014 at 7:33 AM, Juha-Pekka Heikkila
>> <juhapekka.heikkila at gmail.com> wrote:
>>> Our hardware has an "accumulator" register, which can be used to store
>>> intermediate results across multiple instructions.  Many instructions
>>> can implicitly write a value to the accumulator in addition to their
>>> normal destination register.  This is enabled by the "AccWrEn" flag.
>>>
>>> This patch introduces a new flag, inst->writes_accumulator, which
>>> allows us to express the AccWrEn notion in the IR.  It also creates a
>>> n ALU2_ACC macro to easily define emitters for instructions that
>>> implicitly write the accumulator.
>>>
>>> Previously, we only supported implicit accumulator writes from the
>>> ADDC, SUBB, and MACH instructions.  We always enabled them on those
>>> instructions, and left them disabled for other instructions.
>>>
>>> To take advantage of the MAC (multiply-accumulate) instruction, we
>>> need to be able to set AccWrEn on other types of instructions.
>>>
>>> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila at gmail.com>
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs.cpp             | 34 +++++++++++-------------
>>>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  7 +----
>>>  src/mesa/drivers/dri/i965/brw_shader.cpp         |  3 +++
>>>  src/mesa/drivers/dri/i965/brw_shader.h           |  1 +
>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp           | 15 +----------
>>>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  7 +----
>>>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp   | 21 +++++++++++----
>>>  7 files changed, 39 insertions(+), 49 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> index 27cf0f6..107d9e6 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> @@ -64,6 +64,8 @@ fs_inst::init()
>>>
>>>     /* This will be the case for almost all instructions. */
>>>     this->regs_written = 1;
>>> +
>>> +   this->writes_accumulator = false;
>>>  }
>>>
>>>  fs_inst::fs_inst()
>>> @@ -151,6 +153,15 @@ fs_inst::fs_inst(enum opcode opcode, fs_reg dst,
>>>        return new(mem_ctx) fs_inst(BRW_OPCODE_##op, dst, src0, src1);    \
>>>     }
>>>
>>> +#define ALU2_ACC(op)                                                    \
>>> +   fs_inst *                                                            \
>>> +   fs_visitor::op(fs_reg dst, fs_reg src0, fs_reg src1)                 \
>>> +   {                                                                    \
>>> +      fs_inst *inst = new(mem_ctx) fs_inst(BRW_OPCODE_##op, dst, src0, src1);\
>>> +      inst->writes_accumulator = true;                                  \
>>> +      return inst;                                                      \
>>> +   }
>>> +
>>>  #define ALU3(op)                                                        \
>>>     fs_inst *                                                            \
>>>     fs_visitor::op(fs_reg dst, fs_reg src0, fs_reg src1, fs_reg src2)    \
>>> @@ -166,7 +177,7 @@ ALU1(RNDE)
>>>  ALU1(RNDZ)
>>>  ALU2(ADD)
>>>  ALU2(MUL)
>>> -ALU2(MACH)
>>> +ALU2_ACC(MACH)
>>>  ALU2(AND)
>>>  ALU2(OR)
>>>  ALU2(XOR)
>>> @@ -182,8 +193,8 @@ ALU1(FBH)
>>>  ALU1(FBL)
>>>  ALU1(CBIT)
>>>  ALU3(MAD)
>>> -ALU2(ADDC)
>>> -ALU2(SUBB)
>>> +ALU2_ACC(ADDC)
>>> +ALU2_ACC(SUBB)
>>>  ALU2(SEL)
>>>
>>>  /** Gen4 predicated IF. */
>>> @@ -2103,21 +2114,8 @@ fs_visitor::dead_code_eliminate()
>>>           }
>>>
>>>           if (dead) {
>>> -            /* Don't dead code eliminate instructions that write to the
>>> -             * accumulator as a side-effect. Instead just set the destination
>>> -             * to the null register to free it.
>>> -             */
>>> -            switch (inst->opcode) {
>>> -            case BRW_OPCODE_ADDC:
>>> -            case BRW_OPCODE_SUBB:
>>> -            case BRW_OPCODE_MACH:
>>> -               inst->dst = fs_reg(retype(brw_null_reg(), inst->dst.type));
>>> -               break;
>>> -            default:
>>> -               inst->remove();
>>> -               progress = true;
>>> -               break;
>>> -            }
>>> +            inst->remove();
>>> +            progress = true;
>>
>> I think you meant something like
>>
>> if (inst->writes_accumulator) {
>>    inst->dst = fs_reg(retype(brw_null_reg(), inst->dst.type));
>> } else {
>>    inst->remove();
>> }
>> progress = true;
>>
>> rather than removing instructions that implicitly wrote the
>> accumulator when their destination was dead.

I did take the entire switch/case block out because it seemed of very
little use and possibly more harmful than useful. I was trying to see
when the switch case I took out was used for ADDC/SUBB/MACH but it
seemed to execute close to never on running piglit glsl tests.

Reason that part being skipped is when ADDC/SUBB opcodes are used the
destination register is set to dst_null_ud() before arriving here,
similar is almost always true for MACH. I did go checking and found
null register is not grf but considered arf which is internally
enumerated to HW_REG, this causes above switch case to be skipped with
'if's. For example in vec4_visitor::dead_code_eliminate() there is
just outside the switch case:

...
   if (inst->dst.file == GRF && !inst->has_side_effects()) {
...



For MACH there however can be found instance in
vec4_visitor::visit(ir_expression *ir)  (and similar in fs_visitor)
where it says
...
   case ir_binop_imul_high: {
      struct brw_reg acc = retype(brw_acc_reg(), result_dst.type);

      emit(MUL(acc, op[0], op[1]));
      emit(MACH(result_dst, op[0], op[1]));
      break;
   }
...
Above piece of code to my understanding should go wrong in current
version dead code elimination where MACH is forced to have destination
in null register? After using writes_accumulator flag one can get
similar difficulties using many other opcodes. This was prevented when
I had has_side_effects() returning true when writes_accumulator flag
was set -- above "if" dst.file == GRF is true but then
inst->has_side_effects() prevent from changing the destination.


>
> Oh, I see. You added writes_accumulator to has_side_effects(). I don't
> think we want to do that. It weakens dead code elimination and
> instruction scheduling. Instruction scheduling in particular, because
> we treat instructions with side effects as full barriers, preventing
> even independent instructions from being scheduled across the barrier.
>
> I don't see anything in the scheduler that handles the accumulator,
> but it shouldn't be very difficult to do it properly. Handling it like
> we do the flag register (look for last_conditional_mod) would probably
> be good.

I removed my changes from has_side_effects() and did try several
variations to use writes_accumulator flag in calculate_deps() but the
trouble is I never seem to get any regression. Even when I completely
removed all the checks for the accumulator flag from calculate_deps()
I don't get any regression so I figure I am not getting hit by
scheduling. Here is where I would need some idea how to check my
change does not break something, currently I don't know what to look
for.

/Juha-Pekka


More information about the mesa-dev mailing list