[Mesa-dev] [RFC PATCH 1/5] i965: Add writes_accumulator flag
Juha-Pekka Heikkilä
juhapekka.heikkila at gmail.com
Mon Mar 24 07:45:36 PDT 2014
I think I would need some hints here, I don't know how to test my
changes now properly.
On Sat, Mar 22, 2014 at 8:26 AM, Matt Turner <mattst88 at gmail.com> wrote:
> On Fri, Mar 21, 2014 at 11:17 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> On Fri, Mar 21, 2014 at 7:33 AM, Juha-Pekka Heikkila
>> <juhapekka.heikkila at gmail.com> wrote:
>>> Our hardware has an "accumulator" register, which can be used to store
>>> intermediate results across multiple instructions. Many instructions
>>> can implicitly write a value to the accumulator in addition to their
>>> normal destination register. This is enabled by the "AccWrEn" flag.
>>>
>>> This patch introduces a new flag, inst->writes_accumulator, which
>>> allows us to express the AccWrEn notion in the IR. It also creates a
>>> n ALU2_ACC macro to easily define emitters for instructions that
>>> implicitly write the accumulator.
>>>
>>> Previously, we only supported implicit accumulator writes from the
>>> ADDC, SUBB, and MACH instructions. We always enabled them on those
>>> instructions, and left them disabled for other instructions.
>>>
>>> To take advantage of the MAC (multiply-accumulate) instruction, we
>>> need to be able to set AccWrEn on other types of instructions.
>>>
>>> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila at gmail.com>
>>> ---
>>> src/mesa/drivers/dri/i965/brw_fs.cpp | 34 +++++++++++-------------
>>> src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 7 +----
>>> src/mesa/drivers/dri/i965/brw_shader.cpp | 3 +++
>>> src/mesa/drivers/dri/i965/brw_shader.h | 1 +
>>> src/mesa/drivers/dri/i965/brw_vec4.cpp | 15 +----------
>>> src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 7 +----
>>> src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 21 +++++++++++----
>>> 7 files changed, 39 insertions(+), 49 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> index 27cf0f6..107d9e6 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> @@ -64,6 +64,8 @@ fs_inst::init()
>>>
>>> /* This will be the case for almost all instructions. */
>>> this->regs_written = 1;
>>> +
>>> + this->writes_accumulator = false;
>>> }
>>>
>>> fs_inst::fs_inst()
>>> @@ -151,6 +153,15 @@ fs_inst::fs_inst(enum opcode opcode, fs_reg dst,
>>> return new(mem_ctx) fs_inst(BRW_OPCODE_##op, dst, src0, src1); \
>>> }
>>>
>>> +#define ALU2_ACC(op) \
>>> + fs_inst * \
>>> + fs_visitor::op(fs_reg dst, fs_reg src0, fs_reg src1) \
>>> + { \
>>> + fs_inst *inst = new(mem_ctx) fs_inst(BRW_OPCODE_##op, dst, src0, src1);\
>>> + inst->writes_accumulator = true; \
>>> + return inst; \
>>> + }
>>> +
>>> #define ALU3(op) \
>>> fs_inst * \
>>> fs_visitor::op(fs_reg dst, fs_reg src0, fs_reg src1, fs_reg src2) \
>>> @@ -166,7 +177,7 @@ ALU1(RNDE)
>>> ALU1(RNDZ)
>>> ALU2(ADD)
>>> ALU2(MUL)
>>> -ALU2(MACH)
>>> +ALU2_ACC(MACH)
>>> ALU2(AND)
>>> ALU2(OR)
>>> ALU2(XOR)
>>> @@ -182,8 +193,8 @@ ALU1(FBH)
>>> ALU1(FBL)
>>> ALU1(CBIT)
>>> ALU3(MAD)
>>> -ALU2(ADDC)
>>> -ALU2(SUBB)
>>> +ALU2_ACC(ADDC)
>>> +ALU2_ACC(SUBB)
>>> ALU2(SEL)
>>>
>>> /** Gen4 predicated IF. */
>>> @@ -2103,21 +2114,8 @@ fs_visitor::dead_code_eliminate()
>>> }
>>>
>>> if (dead) {
>>> - /* Don't dead code eliminate instructions that write to the
>>> - * accumulator as a side-effect. Instead just set the destination
>>> - * to the null register to free it.
>>> - */
>>> - switch (inst->opcode) {
>>> - case BRW_OPCODE_ADDC:
>>> - case BRW_OPCODE_SUBB:
>>> - case BRW_OPCODE_MACH:
>>> - inst->dst = fs_reg(retype(brw_null_reg(), inst->dst.type));
>>> - break;
>>> - default:
>>> - inst->remove();
>>> - progress = true;
>>> - break;
>>> - }
>>> + inst->remove();
>>> + progress = true;
>>
>> I think you meant something like
>>
>> if (inst->writes_accumulator) {
>> inst->dst = fs_reg(retype(brw_null_reg(), inst->dst.type));
>> } else {
>> inst->remove();
>> }
>> progress = true;
>>
>> rather than removing instructions that implicitly wrote the
>> accumulator when their destination was dead.
I did take the entire switch/case block out because it seemed of very
little use and possibly more harmful than useful. I was trying to see
when the switch case I took out was used for ADDC/SUBB/MACH but it
seemed to execute close to never on running piglit glsl tests.
Reason that part being skipped is when ADDC/SUBB opcodes are used the
destination register is set to dst_null_ud() before arriving here,
similar is almost always true for MACH. I did go checking and found
null register is not grf but considered arf which is internally
enumerated to HW_REG, this causes above switch case to be skipped with
'if's. For example in vec4_visitor::dead_code_eliminate() there is
just outside the switch case:
...
if (inst->dst.file == GRF && !inst->has_side_effects()) {
...
For MACH there however can be found instance in
vec4_visitor::visit(ir_expression *ir) (and similar in fs_visitor)
where it says
...
case ir_binop_imul_high: {
struct brw_reg acc = retype(brw_acc_reg(), result_dst.type);
emit(MUL(acc, op[0], op[1]));
emit(MACH(result_dst, op[0], op[1]));
break;
}
...
Above piece of code to my understanding should go wrong in current
version dead code elimination where MACH is forced to have destination
in null register? After using writes_accumulator flag one can get
similar difficulties using many other opcodes. This was prevented when
I had has_side_effects() returning true when writes_accumulator flag
was set -- above "if" dst.file == GRF is true but then
inst->has_side_effects() prevent from changing the destination.
>
> Oh, I see. You added writes_accumulator to has_side_effects(). I don't
> think we want to do that. It weakens dead code elimination and
> instruction scheduling. Instruction scheduling in particular, because
> we treat instructions with side effects as full barriers, preventing
> even independent instructions from being scheduled across the barrier.
>
> I don't see anything in the scheduler that handles the accumulator,
> but it shouldn't be very difficult to do it properly. Handling it like
> we do the flag register (look for last_conditional_mod) would probably
> be good.
I removed my changes from has_side_effects() and did try several
variations to use writes_accumulator flag in calculate_deps() but the
trouble is I never seem to get any regression. Even when I completely
removed all the checks for the accumulator flag from calculate_deps()
I don't get any regression so I figure I am not getting hit by
scheduling. Here is where I would need some idea how to check my
change does not break something, currently I don't know what to look
for.
/Juha-Pekka
More information about the mesa-dev
mailing list