[Mesa-dev] [PATCH 05/21] i965/fs: Implement SIMD16 integer multiplies on Gen 7.

Tue Dec 23 08:38:03 PST 2014

On 12/22/14, Ben Widawsky <benjamin.widawsky at intel.com> wrote:
> From: Matt Turner <mattst88 at gmail.com>
>
> In order to do a full 32x32 integer multiply using the MUL/MACH macro, the
> operation must be split into 2 SIMD8 operations. This is required even if
> you
> don't care about the high bits. My interpretation of the requirement is that
> the
> accumulator simply doesn't have enough bits to perform the 32x32 multiply.
>
> v2 by Ben:
> Added real commit message
> Squashed in some changes from Matt's simd16 branch
> Change the way sechalf if used.
>    According to my reading of the docs, the sechalf operation should only
> be
>    used for the mov instruction to obtain low bits. Any other usage is for
> the
>    high results from the MUL/MACH macro

I was going to tell you that I strongly suspected that this was wrong
and that we needed better tests to catch this case, but if the tests
still pass after this commit, then we *definitely* need better tests
because this has to be wrong since you're setting force_sechalf on
both MOVs! :)

> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86822
> Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> ---
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 28
> +++++++++++++++++++---------
>  1 file changed, 19 insertions(+), 9 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index 5b83c40..0d8cacb 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -774,15 +774,25 @@ fs_visitor::visit(ir_expression *ir)
>              else
>                 emit(MUL(this->result, op[0], op[1]));
>           } else {
> -            if (brw->gen >= 7)
> -               no16("SIMD16 explicit accumulator operands unsupported\n");
> -
> -            struct brw_reg acc = retype(brw_acc_reg(dispatch_width),
> -                                        this->result.type);
> -
> -            emit(MUL(acc, op[0], op[1]));
> -            emit(MACH(reg_null_d, op[0], op[1]));
> -            emit(MOV(this->result, fs_reg(acc)));
> +            unsigned width = brw->gen == 7 ? 8 : dispatch_width;
> +            fs_reg acc = fs_reg(retype(brw_acc_reg(width),
> this->result.type));
> +            fs_reg null = fs_reg(retype(brw_null_vec(width),
> this->result.type));
> +
> +            if (brw->gen == 7 && dispatch_width == 16) {
> +               emit(MUL(acc, half(op[0], 0), half(op[1], 0)));
> +               emit(MACH(null, half(op[0], 0), half(op[1], 0)));
> +               fs_inst *mov = emit(MOV(half(this->result, 0), acc));
> +               mov->force_sechalf = true;

This would calculate the first 8 channels' results but then copy them
to this->result using the second half's channel enables. Imagine that
the first eight channels are enabled, but the last eight are disabled
-- the MOV won't copy any data.

> +
> +               emit(MUL(acc, half(op[0], 1), half(op[1], 1)));
> +               emit(MACH(null, half(op[0], 1), half(op[1], 1)));
> +               mov = emit(MOV(half(this->result, 1), acc));
> +               mov->force_sechalf = true;

And here, we're calculating the last eight channels using the first
half's channel enable bits, so when the MOV executes it might be
reading data from disabled channels.