[Mesa-dev] [PATCH 05/21] i965/fs: Implement SIMD16 integer multiplies on Gen 7.
Matt Turner
mattst88 at gmail.com
Tue Dec 23 08:38:03 PST 2014
On 12/22/14, Ben Widawsky <benjamin.widawsky at intel.com> wrote:
> From: Matt Turner <mattst88 at gmail.com>
>
> In order to do a full 32x32 integer multiply using the MUL/MACH macro, the
> operation must be split into 2 SIMD8 operations. This is required even if
> you
> don't care about the high bits. My interpretation of the requirement is that
> the
> accumulator simply doesn't have enough bits to perform the 32x32 multiply.
>
> v2 by Ben:
> Added real commit message
> Squashed in some changes from Matt's simd16 branch
> Change the way sechalf if used.
> According to my reading of the docs, the sechalf operation should only
> be
> used for the mov instruction to obtain low bits. Any other usage is for
> the
> high results from the MUL/MACH macro
I was going to tell you that I strongly suspected that this was wrong
and that we needed better tests to catch this case, but if the tests
still pass after this commit, then we *definitely* need better tests
because this has to be wrong since you're setting force_sechalf on
both MOVs! :)
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86822
> Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> ---
> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 28
> +++++++++++++++++++---------
> 1 file changed, 19 insertions(+), 9 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index 5b83c40..0d8cacb 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -774,15 +774,25 @@ fs_visitor::visit(ir_expression *ir)
> else
> emit(MUL(this->result, op[0], op[1]));
> } else {
> - if (brw->gen >= 7)
> - no16("SIMD16 explicit accumulator operands unsupported\n");
> -
> - struct brw_reg acc = retype(brw_acc_reg(dispatch_width),
> - this->result.type);
> -
> - emit(MUL(acc, op[0], op[1]));
> - emit(MACH(reg_null_d, op[0], op[1]));
> - emit(MOV(this->result, fs_reg(acc)));
> + unsigned width = brw->gen == 7 ? 8 : dispatch_width;
> + fs_reg acc = fs_reg(retype(brw_acc_reg(width),
> this->result.type));
> + fs_reg null = fs_reg(retype(brw_null_vec(width),
> this->result.type));
> +
> + if (brw->gen == 7 && dispatch_width == 16) {
> + emit(MUL(acc, half(op[0], 0), half(op[1], 0)));
> + emit(MACH(null, half(op[0], 0), half(op[1], 0)));
> + fs_inst *mov = emit(MOV(half(this->result, 0), acc));
> + mov->force_sechalf = true;
This would calculate the first 8 channels' results but then copy them
to this->result using the second half's channel enables. Imagine that
the first eight channels are enabled, but the last eight are disabled
-- the MOV won't copy any data.
> +
> + emit(MUL(acc, half(op[0], 1), half(op[1], 1)));
> + emit(MACH(null, half(op[0], 1), half(op[1], 1)));
> + mov = emit(MOV(half(this->result, 1), acc));
> + mov->force_sechalf = true;
And here, we're calculating the last eight channels using the first
half's channel enable bits, so when the MOV executes it might be
reading data from disabled channels.
More information about the mesa-dev
mailing list