[Mesa-dev] [PATCH 07/21] i965/fs: Implement SIMD16 64-bit integer multiplies on Gen 8.
Matt Turner
mattst88 at gmail.com
Tue Dec 23 08:51:33 PST 2014
On 12/22/14, Ben Widawsky <benjamin.widawsky at intel.com> wrote:
> This patch uses the new QWORD type introduced on Gen8. This allows us to
> perform
> the operation without requiring the additional MACH.
>
> Similar to Gen7, it seems we must demote SIMD16 to 2 SIMD8s. On the bright
> side,
> we get the results in 3 instructions, and no MACH. MACH is undesirable
> because
> it requires the accumulator write flag, which can hinder optimization
> passes.
>
> Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> ---
> src/mesa/drivers/dri/i965/brw_defines.h | 2 +
> src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 12 +++++
> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 73
> ++++++++++++++------------
> src/mesa/drivers/dri/i965/brw_shader.cpp | 2 +
> 4 files changed, 56 insertions(+), 33 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 28e398d..102ba4a 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -1100,6 +1100,8 @@ enum opcode {
> * and number of SO primitives needed.
> */
> GS_OPCODE_FF_SYNC_SET_PRIMITIVES,
> +
> + SHADER_OPCODE_MOV64,
> };
>
> enum brw_urb_write_flags {
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index c652d65..3a15837 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -1619,6 +1619,18 @@ fs_generator::generate_code(const cfg_t *cfg, int
> dispatch_width)
> case BRW_OPCODE_MUL:
> brw_MUL(p, dst, src[0], src[1]);
> break;
> + case SHADER_OPCODE_MOV64:
> + /* This opcode is used to mov the result of a native SIMD16 (2
> SIMD8s)
> + * mul into the dst register.
> + */
> + assert(brw->gen >= 8);
> + src[0].subnr = 4;
> + src[0].type = dst.type;
> + src[0] = stride(src[0], 8, 4, 2);
> + assert(dst.type == BRW_REGISTER_TYPE_UD ||
> + dst.type == BRW_REGISTER_TYPE_D);
> + brw_MOV(p, dst, src[0]);
> + break;
> case BRW_OPCODE_AVG:
> brw_AVG(p, dst, src[0], src[1]);
> break;
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index de03618..98f1b0d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -798,46 +798,53 @@ fs_visitor::visit(ir_expression *ir)
> emit(MUL(this->result, op[0], op[1]));
> }
> break;
> - case ir_binop_imul_high: {
> - if (brw->gen >= 8)
> - no16("SIMD16 explicit accumulator operands unsupported\n");
> + case ir_binop_imul_high:
> + if (brw->gen >= 8) {
> + /* Gen8 is able to do the full 32x32 multiply into a QWORD.
> + * The docs say you cannot use direct addressing for a destination
> of
> + * more than 2 registers, which is the case in SIMD16. Therefore,
> like
> + * Gen7, the operation must be downgraded to two SIMD8 muls.
> + *
> + * Oddly, the results can be gathered with 1 mov operation, even
> though
> + * the docs suggest that shouldn't work.
> + */
Does the simulator not complain about this? I think it definitely
should. The docs are clear that you can't read from >2 consecutive
registers.
I suspect that again the piglit tests are insufficient. I think all of
the piglit tests are calculating the same result in each channel.
More information about the mesa-dev
mailing list