[Mesa-dev] [PATCH 2/2] i965/fs: Lower arithmetic instructions with register regions of unsupported width.
Connor Abbott
cwabbott0 at gmail.com
Wed Aug 5 11:14:26 PDT 2015
FWIW, both patches are:
Reviewed-by: Connor Abbott <connor.w.abbott at intel.com>
I'm working on FP64 support (I've been using no16 up till now) so this
is obviously very useful to me.
On Wed, Aug 5, 2015 at 10:38 AM, Francisco Jerez <currojerez at riseup.net> wrote:
> This extends the SIMD lowering pass to enforce the hardware limitation
> that no directly-addressed source may read more than 2 physical GRFs.
> One can easily go over this limit when doing 64-bit arithmetic
> (e.g. FP64 or extended-precision integer MULs) or SIMD32, so it's nice
> to be able to just emit an instruction of the intended execution size
> from the visitor and let the lowering pass deal with this restriction
> transparently.
>
> Some hardware arithmetic instructions are not handled here, including
> all instructions that use the accumulator implicitly (which the SIMD
> lowering pass deliberately doesn't handle), instructions with
> non-per-channel sources (e.g. LINE or PLANE) and SEND-like
> instructions, which need special handling most likely as virtual
> opcodes.
> ---
> src/mesa/drivers/dri/i965/brw_fs.cpp | 62 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 62 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index f9773bd..fa5ed4f 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -4130,6 +4130,68 @@ get_lowered_simd_width(const struct brw_device_info *devinfo,
> const fs_inst *inst)
> {
> switch (inst->opcode) {
> + case BRW_OPCODE_MOV:
> + case BRW_OPCODE_SEL:
> + case BRW_OPCODE_NOT:
> + case BRW_OPCODE_AND:
> + case BRW_OPCODE_OR:
> + case BRW_OPCODE_XOR:
> + case BRW_OPCODE_SHR:
> + case BRW_OPCODE_SHL:
> + case BRW_OPCODE_ASR:
> + case BRW_OPCODE_CMP:
> + case BRW_OPCODE_CMPN:
> + case BRW_OPCODE_CSEL:
> + case BRW_OPCODE_F32TO16:
> + case BRW_OPCODE_F16TO32:
> + case BRW_OPCODE_BFREV:
> + case BRW_OPCODE_BFE:
> + case BRW_OPCODE_BFI1:
> + case BRW_OPCODE_BFI2:
> + case BRW_OPCODE_ADD:
> + case BRW_OPCODE_MUL:
> + case BRW_OPCODE_AVG:
> + case BRW_OPCODE_FRC:
> + case BRW_OPCODE_RNDU:
> + case BRW_OPCODE_RNDD:
> + case BRW_OPCODE_RNDE:
> + case BRW_OPCODE_RNDZ:
> + case BRW_OPCODE_LZD:
> + case BRW_OPCODE_FBH:
> + case BRW_OPCODE_FBL:
> + case BRW_OPCODE_CBIT:
> + case BRW_OPCODE_SAD2:
> + case BRW_OPCODE_MAD:
> + case BRW_OPCODE_LRP:
> + case SHADER_OPCODE_RCP:
> + case SHADER_OPCODE_RSQ:
> + case SHADER_OPCODE_SQRT:
> + case SHADER_OPCODE_EXP2:
> + case SHADER_OPCODE_LOG2:
> + case SHADER_OPCODE_POW:
> + case SHADER_OPCODE_INT_QUOTIENT:
> + case SHADER_OPCODE_INT_REMAINDER:
> + case SHADER_OPCODE_SIN:
> + case SHADER_OPCODE_COS: {
> + /* According to the PRMs:
> + * "A. In Direct Addressing mode, a source cannot span more than 2
> + * adjacent GRF registers.
> + * B. A destination cannot span more than 2 adjacent GRF registers."
> + *
> + * Look for the source or destination with the largest register region
> + * which is the one that is going to limit the overal execution size of
> + * the instruction due to this rule.
> + */
> + unsigned reg_count = inst->regs_written;
> +
> + for (unsigned i = 0; i < inst->sources; i++)
> + reg_count = MAX2(reg_count, (unsigned)inst->regs_read(i));
> +
> + /* Calculate the maximum execution size of the instruction based on the
> + * factor by which it goes over the hardware limit of 2 GRFs.
> + */
> + return inst->exec_size / DIV_ROUND_UP(reg_count, 2);
> + }
> case SHADER_OPCODE_MULH:
> /* MULH is lowered to the MUL/MACH sequence using the accumulator, which
> * is 8-wide on Gen7+.
> --
> 2.4.6
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list