[Mesa-dev] [PATCH 07/12] nir: Allow [iu]mul_high on non-32-bit types
Jason Ekstrand
jason at jlekstrand.net
Mon Oct 8 21:14:58 UTC 2018
On Mon, Oct 8, 2018 at 3:46 PM Ian Romanick <idr at freedesktop.org> wrote:
> On 10/05/2018 09:10 PM, Jason Ekstrand wrote:
> > ---
> > src/compiler/nir/nir_constant_expressions.py | 1 +
> > src/compiler/nir/nir_opcodes.py | 43 ++++++++++++++++++--
> > 2 files changed, 40 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/compiler/nir/nir_constant_expressions.py
> b/src/compiler/nir/nir_constant_expressions.py
> > index 118af9f7818..afc0739e8b2 100644
> > --- a/src/compiler/nir/nir_constant_expressions.py
> > +++ b/src/compiler/nir/nir_constant_expressions.py
> > @@ -79,6 +79,7 @@ template = """\
> > #include <math.h>
> > #include "util/rounding.h" /* for _mesa_roundeven */
> > #include "util/half_float.h"
> > +#include "util/bigmath.h"
> > #include "nir_constant_expressions.h"
> >
> > /**
> > diff --git a/src/compiler/nir/nir_opcodes.py
> b/src/compiler/nir/nir_opcodes.py
> > index 4ef4ecc6f22..209f0c5509b 100644
> > --- a/src/compiler/nir/nir_opcodes.py
> > +++ b/src/compiler/nir/nir_opcodes.py
> > @@ -443,12 +443,47 @@ binop("isub", tint, "", "src0 - src1")
> > binop("fmul", tfloat, commutative + associative, "src0 * src1")
> > # low 32-bits of signed/unsigned integer multiply
> > binop("imul", tint, commutative + associative, "src0 * src1")
> > +
> > # high 32-bits of signed integer multiply
> > -binop("imul_high", tint32, commutative,
> > - "(int32_t)(((int64_t) src0 * (int64_t) src1) >> 32)")
> > +binop("imul_high", tint, commutative, """
>
> This will enable imul_high for all integer types (ditto for umul_high
> below). A later patch adds lowering for 64-bit integer type. Will the
> backend do the right thing for [iu]mul_high of 16- or 8-bit types?
>
That's a good question. Looks like lower_integer_multiplication in the
back-end will do nothing whatsoever, and we'll emit an illegal opcode which
will probably hang the GPU. For 8 and 16, it's easy enough to lower to a
couple of conversions, a N*2-bit multiply, and a shift. It's also not
obvious where the cut-off point for the optimization is. Certainly, it's
better in 64-bits than doing the division algorithm in the shader and I
think it's better for 32 but maybe not in 8 and 16? I'm not sure. I'm
pretty sure my 32-bit benchmark gave positive results (about 40-50% faster)
but it was very noisy.
I don't think anything allows 8 and 16-bit arithmetic right now. Still,
should probably fix it...
--Jason
> > +if (bit_size == 64) {
> > + /* We need to do a full 128-bit x 128-bit multiply in order for the
> sign
> > + * extension to work properly. The casts are kind-of annoying but
> needed
> > + * to prevent compiler warnings.
> > + */
> > + uint32_t src0_u32[4] = {
> > + src0,
> > + (int64_t)src0 >> 32,
> > + (int64_t)src0 >> 63,
> > + (int64_t)src0 >> 63,
> > + };
> > + uint32_t src1_u32[4] = {
> > + src1,
> > + (int64_t)src1 >> 32,
> > + (int64_t)src1 >> 63,
> > + (int64_t)src1 >> 63,
> > + };
> > + uint32_t prod_u32[4];
> > + ubm_mul_u32arr(prod_u32, src0_u32, src1_u32);
> > + dst = (uint64_t)prod_u32[2] | ((uint64_t)prod_u32[3] << 32);
> > +} else {
> > + dst = ((int64_t)src0 * (int64_t)src1) >> bit_size;
> > +}
> > +""")
> > +
> > # high 32-bits of unsigned integer multiply
> > -binop("umul_high", tuint32, commutative,
> > - "(uint32_t)(((uint64_t) src0 * (uint64_t) src1) >> 32)")
> > +binop("umul_high", tuint, commutative, """
> > +if (bit_size == 64) {
> > + /* The casts are kind-of annoying but needed to prevent compiler
> warnings. */
> > + uint32_t src0_u32[2] = { src0, (uint64_t)src0 >> 32 };
> > + uint32_t src1_u32[2] = { src1, (uint64_t)src1 >> 32 };
> > + uint32_t prod_u32[4];
> > + ubm_mul_u32arr(prod_u32, src0_u32, src1_u32);
> > + dst = (uint64_t)prod_u32[2] | ((uint64_t)prod_u32[3] << 32);
> > +} else {
> > + dst = ((uint64_t)src0 * (uint64_t)src1) >> bit_size;
> > +}
> > +""")
> >
> > binop("fdiv", tfloat, "", "src0 / src1")
> > binop("idiv", tint, "", "src1 == 0 ? 0 : (src0 / src1)")
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20181008/7aa1e42d/attachment.html>
More information about the mesa-dev
mailing list