[Mesa-dev] [PATCH 3/9] nir: Add a new ALU nir_op_imad24_ir3

Mon Feb 25 19:19:07 UTC 2019

On 2/13/19 1:29 PM, Eduardo Lima Mitev wrote:
> ir3 compiler has an integer multiply-add instruction (MAD_S24)
> that is used for different offset calculations in the backend.
> Since we intend to move some of these calculations to NIR, we need
> a new ALU op that can directly represent it.
> ---
>  src/compiler/nir/nir_opcodes.py | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
> index d32005846a6..abbb3627a33 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -892,3 +892,19 @@ dst.w = src3.x;
>  """)
>  
>  
> +# Freedreno-specific opcode that maps directly to ir3_MAD_S24.
> +# It is emitted by ir3_nir_lower_io_offsets pass when computing
> +# byte-offsets for image store and atomics.
> +#
> +# The nir_algebraic expression below is: get 23 bits of the
> +# two factors as unsigned and multiply them. If either of the
> +# two was negative, invert sign of the product. Then add it src2.
> +# @FIXME: I suspect there is a simpler expression for this.
> +triop("imad24_ir3", tint, """
> +unsigned f0 = ((unsigned) src0) & 0x7fffff;
> +unsigned f1 = ((unsigned) src1) & 0x7fffff;
> +dst = f0 * f1;

How about (((int)src0 << 8) >> 8) * (((int)src1 << 8) >> 8) + src2?  The
trick is making sure the implementation matches what the hardware does
in all cases.  My expression will produce different results than yours
for cases like 0xf01fffff * 2.  0x3ffffe vs -0x3ffffe.  "Correct"
depends entirely on what real hardware would produce.  If I had to
guess, I would guess that the hardware would produce 0x3ffffe since it
likely just ignores the upper 8 bits of the sources.

> +if (src0 * src1 < 0)
> +   dst = -dst;
> +dst += src2;
> +""")
>