[Mesa-dev] [Mesa-stable] [PATCH 2/4] i965/vec4: Handle ir_triop_lrp on Gen4-5 as well.

Wed Feb 26 11:26:35 PST 2014

On 02/26/2014 08:35 AM, Ian Romanick wrote:
> On 02/25/2014 05:06 PM, Kenneth Graunke wrote:
>> On 02/25/2014 09:38 AM, Eric Anholt wrote:
>>> Matt Turner <mattst88 at gmail.com> writes:
>>>
>>>> On Mon, Feb 24, 2014 at 10:15 AM, Eric Anholt <eric at anholt.net> wrote:
>>>>> I think we would do better by emitting
>>>>> ADD(y_minus_x, y, negate(x))
>>>>> MAC(dst, x, y_minus_x, a)
>>>>
>>>> MAC only takes two arguments, so
>>>>   - if you meant MAD, there's no MAD on platforms that don't have LRP
>>>>   - if you meant MAC(dst, ...) I don't see a way of doing it only two
>>>> instructions, but we could do
>>>>
>>>> MOV(acc, x)
>>>> ADD(y_minus_x, y, negate(x)
>>>> MAC(dst, y_minus_x, a)
>>>
>>> Oops, yeah, I was still thinking in terms of MAD.  This should still be
>>> better I think, while being an obvious translation of the LRP
>>> instruction:
>>>
>>> ADD one_minus_a, negate(a), 1.0f
>>> MUL null, y, a
>>> MAC dst, x, one_minus_a
>>>
>>> (multiplying y * a first to slightly reduce the stall pressure from
>>> one_minus_a)
>>
>> Nice.  I agree this is better, but it's harder than you think.  We would
>> have to:
>>
>> 1. Create a MAC() emitter.
>> 2. Add BRW_OPCODE_MAC to vec4_generator.
>> 3. Add a new "enable accumulator writes" flag to vec4_instruction
>>     and make vec4_generator respect that.  (The MUL needs this.)
>> 4. Fix up dead code elimination and other things to know about implicit
>> accumulator writes.
> 
> Can you write a slightly expanded description of what needs doing? Don't
> take more than 10 minutes.  This is exactly the sort of task that I'd
> like to take with me to Finland. :)

Part 1: Adding arbitrary accumulator write support.
---------------------------------------------------

i965 hardware has an "accumulator" register, which can be used to store
intermediate results across multiple instructions.  It is higher
precision than ordinary registers.

Many assembly instructions support the "AccWrEn" flag to write a value
to the accumulator in addition to their destination.  (This may be a
different value.  For example, addc stores the addition result in dst,
but the carry result in the accumulator.)

Some instructions read from the accumulator implicitly, while others can
use it as an explicit source register.  (See the ISA reference for
restrictions on various instructions.)

Currently, the i965 Vec4 backend uses the accumulator only for a few
instructions (ADDC, SUBB, MACH) where it's absolutely necessary.  We
would like to support it more generally.

1. Create a new flag in vec4_instruction to represent AccWrEn:
   bool write_accumulator;

2. Update the instruction creators for ADDC, SUBB, and MACH to set it.

brw_vec4_visitor.cpp defines a number of visitor methods that create
instructions: MUL(), ADD(), and so on.  Since the majority of these are
identical (other than the opcode), they're implemented via macros: ALU1,
ALU2, and ALU3.

- Create a new ALU2_ACC macro that is identical to ALU2, but which sets
the "write_accumulator" flag after allocating the new instruction.

- Change ADDC/SUBB/MACH to be implemented with ALU2_ACC() instead of ALU2().

3. Update the dead code elimination pass to consider the new flag.

Normally, if nothing uses an instruction's destination register, we can
eliminate that instruction.  However, instructions that implicitly write
to the accumulator produce additional data which may still be used.  So,
dead code elimination instead simply changes the destination register to
the null register to free up that register.

In vec4_visitor::dead_code_eliminate(), replace the switch statement on
opcode with:

if (inst->write_accumulator)
   inst->dst = dst_reg(retype(brw_null_reg(), inst->dst.type));
else
   inst->remove();

(Since we set the flag on ADDC/SUBB/MACH, this should be equivalent, but
will also handle any new instructions that implicitly write to the
accumulator.)

5. Make the flag actually affect assembly output.

The vec4_generator class is what translates this IR (list of
vec4_instructions) to the actual assembly code.  At a lower level, it
uses the brw_eu_emit.c infrastructure to emit that code.

The brw_eu_emit.c code allows you to set the "default state" for
subsequent instructions.

In vec4_generator::generate_code(), find this block of code:

      brw_set_conditionalmod(p, inst->conditional_mod);
      brw_set_predicate_control(p, inst->predicate);
      brw_set_predicate_inverse(p, inst->predicate_inverse);
      brw_set_saturate(p, inst->saturate);
      brw_set_mask_control(p, inst->force_writemask_all);

This sets up the default state according to the flags.  Note how the
next call is generate_vec4_instruction(), which generates assembly
instruction(s) from the IR.  You'll want to add:

      brw_set_acc_write_control(p, inst->write_accumulator);

With this in place, the brw_set_acc_write_control calls in
generate_vec4_instruction's MACH, ADDC, SUBB cases are unnecessary.
Remove them.

You should now should regression test your code using Piglit.

Part 2: Using MAC to optimize LRP and MAD
-----------------------------------------

The Gen4-5 architectures do not support three-source instructions, such
as MAD (multiply-add) or LRP (linear interpolate).  However, they do
support the MAC (multiply-accumulate) instruction, which can be used to
implement those more efficiently than a series of MUL and ADDs.

1. Add a new MAC emitter.  In brw_vec4_visitor.cpp, add ALU2_ACC(MAC)
(assuming you called your new macro ALU2_ACC in the previous project).

2. Update vec4_visitor::emit_lrp() to use MAC.

The assembly code you want to generate is in a comment.  You can drop
the x_times_one_minus_a temporary register.

You're done!  Regression test your code with Piglit on a Gen4-5 system
(Ironlake, Eaglelake/Cantiga, or Broadwater/Crestline).

I believe a similar project could be undertaken for the FS backend.

--Ken

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140226/f8824052/attachment-0001.pgp>