[Mesa-dev] [PATCH 1/2] i965: Add accumulator flag support and MAC opcode to use it

Kenneth Graunke kenneth at whitecape.org
Fri Mar 14 02:27:24 PDT 2014


On 03/10/2014 07:59 AM, Juha-Pekka Heikkila wrote:
> This change MACH, ADDC and SUBB opcodes to use added accumulator
> flag and add new opcode MAC which use accumulator flag.
> 
> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila at gmail.com>

I'm going to be a bit pedantic here (sorry!).  Normally, commit messages
are of the form:

    prefix: <short one-line description of what you did>

    <longer explanation of why you did that, what benefit it provides,
     why it's necessary for future things...or something like that>

Here, your long description doesn't really explain why adding this
flag is a good idea.

This may seem a bit unnecessary, but it's actually very common for
people to use 'git blame' and 'git log' to find the origin of some
code - why does it exist, what problem was the author trying to solve,
etc.  It can often help people understand the code's purpose, and help
answer questions like "do we still need this code today?".  Good commit
messages have really helped me familiarize myself with the code.

Also, this patch really does two things:
1. Add the flag and hook it up for the existing instructions.
2. Implement the new MAC opcode.

I would split those into two patches (even though they're small).

For the new patches, I might word the commit messages to be something like:

i965/vec4: Add a vec4_instruction::writes_accumulator flag.

Our hardware has an "accumulator" register, which can be used to store
intermediate results across multiple instructions.  Many instructions
can implicitly write a value to the accumulator in addition to their
normal destination register.  This is enabled by the "AccWrEn" flag.

This patch introduces a new flag, inst->writes_accumulator, which
allows us to express the AccWrEn notion in the Vec4 IR.  It also
creates an ALU2_ACC macro to easily define emitters for instructions
that implicitly write the accumulator.

Previously, we only supported implicit accumulator writes from the
ADDC, SUBB, and MACH instructions.  We always enabled them on those
instructions, and left them disabled for other instructions.

To take advantage of the MAC (multiply-accumulate) instruction, we
need to be able to set AccWrEn on other types of instructions.

Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila at gmail.com>

i965/vec4: Add support for the MAC instruction.

This allows us to generate the MAC (multiply-accumulate) instruction,
which can be used to implement some expressions in fewer instructions
than doing a series of MUL and ADDs.

Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila at gmail.com>

A couple more tiny comments below...

> ---
>  src/mesa/drivers/dri/i965/brw_eu.h               |  1 +
>  src/mesa/drivers/dri/i965/brw_vec4.cpp           | 10 ++--------
>  src/mesa/drivers/dri/i965/brw_vec4.h             |  1 +
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 12 ++++++------
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp   | 20 ++++++++++++++++----
>  5 files changed, 26 insertions(+), 18 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_eu.h b/src/mesa/drivers/dri/i965/brw_eu.h
> index 5df6bb7..f10ad50 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu.h
> +++ b/src/mesa/drivers/dri/i965/brw_eu.h
> @@ -183,6 +183,7 @@ ALU1(FBL)
>  ALU1(CBIT)
>  ALU2(ADDC)
>  ALU2(SUBB)
> +ALU2(MAC)
>  
>  ROUND(RNDZ)
>  ROUND(RNDE)
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index c30afae..6b1fde7 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -337,16 +337,10 @@ vec4_visitor::dead_code_eliminate()
>               * accumulator as a side-effect. Instead just set the destination
>               * to the null register to free it.
>               */
> -            switch (inst->opcode) {
> -            case BRW_OPCODE_ADDC:
> -            case BRW_OPCODE_SUBB:
> -            case BRW_OPCODE_MACH:
> +            if (inst->write_accumulator)
>                 inst->dst = dst_reg(retype(brw_null_reg(), inst->dst.type));
> -               break;
> -            default:
> +            else
>                 inst->remove();
> -               break;
> -            }
>              progress = true;
>           }
>        }
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h
> index d97c103..a1e973f 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.h
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
> @@ -242,6 +242,7 @@ public:
>     bool saturate;
>     bool force_writemask_all;
>     bool no_dd_clear, no_dd_check;
> +   bool write_accumulator; /**< intsruction write result in accumulator */

Could we call this "writes_accumulator"?  I think
"inst->writes_accumulator" sounds a bit nicer when read aloud than
"inst->write_accumulator".  (This is my fault, I suggested the name you
used initially...sorry!)

Also, the comment doesn't /quite/ capture the meaning...an instruction like:

add(8) acc0 g3<4,4,1>F 4.0F { Align16 }

also writes its result to the accumulator...using an explicit destination.

I might say:
bool writes_accumulator; /**< instruction implicitly writes accumulator */

>  
>     int conditional_mod; /**< BRW_CONDITIONAL_* */
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> index a74514f..3e85ab1 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> @@ -971,9 +971,7 @@ vec4_generator::generate_vec4_instruction(vec4_instruction *instruction,
>        brw_MUL(p, dst, src[0], src[1]);
>        break;
>     case BRW_OPCODE_MACH:
> -      brw_set_acc_write_control(p, 1);
>        brw_MACH(p, dst, src[0], src[1]);
> -      brw_set_acc_write_control(p, 0);
>        break;
>  
>     case BRW_OPCODE_MAD:
> @@ -1077,16 +1075,17 @@ vec4_generator::generate_vec4_instruction(vec4_instruction *instruction,
>        break;
>     case BRW_OPCODE_ADDC:
>        assert(brw->gen >= 7);
> -      brw_set_acc_write_control(p, 1);
>        brw_ADDC(p, dst, src[0], src[1]);
> -      brw_set_acc_write_control(p, 0);
>        break;
>     case BRW_OPCODE_SUBB:
>        assert(brw->gen >= 7);
> -      brw_set_acc_write_control(p, 1);
>        brw_SUBB(p, dst, src[0], src[1]);
> -      brw_set_acc_write_control(p, 0);
>        break;
> +   case BRW_OPCODE_MAC:
> +      assert(brw->gen >= 4);

This assertion isn't necessary, as it's always true---the i965 driver
only applies to Gen4+.  (Gen2-3 are handled by the i915 driver, which is
now separate.)

> +      brw_MAC(p, dst, src[0], src[1]);
> +      break;
> +
>  
>     case BRW_OPCODE_BFE:
>        assert(brw->gen >= 7);
> @@ -1317,6 +1316,7 @@ vec4_generator::generate_code(exec_list *instructions)
>        brw_set_predicate_inverse(p, inst->predicate_inverse);
>        brw_set_saturate(p, inst->saturate);
>        brw_set_mask_control(p, inst->force_writemask_all);
> +      brw_set_acc_write_control(p, inst->write_accumulator);
>  
>        unsigned pre_emit_nr_insn = p->nr_insn;
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> index 88ee929..dc58457 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> @@ -42,6 +42,7 @@ vec4_instruction::vec4_instruction(vec4_visitor *v,
>     this->force_writemask_all = false;
>     this->no_dd_clear = false;
>     this->no_dd_check = false;
> +   this->write_accumulator = false;
>     this->conditional_mod = BRW_CONDITIONAL_NONE;
>     this->sampler = 0;
>     this->texture_offset = 0;
> @@ -121,7 +122,17 @@ vec4_visitor::emit(enum opcode opcode)
>     vec4_visitor::op(dst_reg dst, src_reg src0, src_reg src1)		\
>     {									\
>        return new(mem_ctx) vec4_instruction(this, BRW_OPCODE_##op, dst,	\
> -					   src0, src1);			\
> +                                          src0, src1);                 \
> +   }
> +
> +#define ALU2_ACC(op)							\
> +   vec4_instruction *							\
> +   vec4_visitor::op(dst_reg dst, src_reg src0, src_reg src1)		\
> +   {									\
> +      vec4_instruction *rVar = new(mem_ctx) vec4_instruction(this,     \
> +         BRW_OPCODE_##op, dst, src0, src1);                            \
> +      rVar->write_accumulator = true;                                  \
> +      return rVar;                                                     \

We typically don't use camelCaseVariables in the driver.  I would
probably just call this "inst" or "ir".

Other than those minor nits, the code looks good to me.

>     }
>  
>  #define ALU3(op)							\
> @@ -143,7 +154,7 @@ ALU1(F32TO16)
>  ALU1(F16TO32)
>  ALU2(ADD)
>  ALU2(MUL)
> -ALU2(MACH)
> +ALU2_ACC(MACH)
>  ALU2(AND)
>  ALU2(OR)
>  ALU2(XOR)
> @@ -162,8 +173,9 @@ ALU1(FBH)
>  ALU1(FBL)
>  ALU1(CBIT)
>  ALU3(MAD)
> -ALU2(ADDC)
> -ALU2(SUBB)
> +ALU2_ACC(ADDC)
> +ALU2_ACC(SUBB)
> +ALU2_ACC(MAC)
>  
>  /** Gen4 predicated IF. */
>  vec4_instruction *
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140314/49d173bd/attachment.pgp>


More information about the mesa-dev mailing list