<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Tue, Oct 9, 2018 at 11:32 AM Ian Romanick <<a href="mailto:idr@freedesktop.org">idr@freedesktop.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 10/08/2018 02:14 PM, Jason Ekstrand wrote:<br> > On Mon, Oct 8, 2018 at 3:46 PM Ian Romanick <<a href="mailto:idr@freedesktop.org" target="_blank">idr@freedesktop.org</a><br> > <mailto:<a href="mailto:idr@freedesktop.org" target="_blank">idr@freedesktop.org</a>>> wrote:<br> > <br> > On 10/05/2018 09:10 PM, Jason Ekstrand wrote:<br> > > ---<br> > > src/compiler/nir/nir_constant_expressions.py | 1 +<br> > > src/compiler/nir/nir_opcodes.py | 43<br> > ++++++++++++++++++--<br> > > 2 files changed, 40 insertions(+), 4 deletions(-)<br> > ><br> > > diff --git a/src/compiler/nir/nir_constant_expressions.py<br> > b/src/compiler/nir/nir_constant_expressions.py<br> > > index 118af9f7818..afc0739e8b2 100644<br> > > --- a/src/compiler/nir/nir_constant_expressions.py<br> > > +++ b/src/compiler/nir/nir_constant_expressions.py<br> > > @@ -79,6 +79,7 @@ template = """\<br> > > #include <math.h><br> > > #include "util/rounding.h" /* for _mesa_roundeven */<br> > > #include "util/half_float.h"<br> > > +#include "util/bigmath.h"<br> > > #include "nir_constant_expressions.h"<br> > > <br> > > /**<br> > > diff --git a/src/compiler/nir/nir_opcodes.py<br> > b/src/compiler/nir/nir_opcodes.py<br> > > index 4ef4ecc6f22..209f0c5509b 100644<br> > > --- a/src/compiler/nir/nir_opcodes.py<br> > > +++ b/src/compiler/nir/nir_opcodes.py<br> > > @@ -443,12 +443,47 @@ binop("isub", tint, "", "src0 - src1")<br> > > binop("fmul", tfloat, commutative + associative, "src0 * src1")<br> > > # low 32-bits of signed/unsigned integer multiply<br> > > binop("imul", tint, commutative + associative, "src0 * src1")<br> > > +<br> > > # high 32-bits of signed integer multiply<br> > > -binop("imul_high", tint32, commutative,<br> > > - "(int32_t)(((int64_t) src0 * (int64_t) src1) >> 32)")<br> > > +binop("imul_high", tint, commutative, """<br> > <br> > This will enable imul_high for all integer types (ditto for umul_high<br> > below). A later patch adds lowering for 64-bit integer type. Will the<br> > backend do the right thing for [iu]mul_high of 16- or 8-bit types?<br> > <br> > <br> > That's a good question. Looks like lower_integer_multiplication in the<br> > back-end will do nothing whatsoever, and we'll emit an illegal opcode<br> > which will probably hang the GPU. For 8 and 16, it's easy enough to<br> > lower to a couple of conversions, a N*2-bit multiply, and a shift. It's<br> > also not obvious where the cut-off point for the optimization is. <br> > Certainly, it's better in 64-bits than doing the division algorithm in<br> > the shader and I think it's better for 32 but maybe not in 8 and 16? <br> > I'm not sure. I'm pretty sure my 32-bit benchmark gave positive results<br> > (about 40-50% faster) but it was very noisy.<br> > <br> > I don't think anything allows 8 and 16-bit arithmetic right now. Still,<br> > should probably fix it...<br> <br> Hm... if an extension adds GL or Vulkan support for 16-bit arithmetic, I<br> doubt it would add [iu]mul_high (e.g., GL_AMD_gpu_shader_int16). I'd<br> expect every GPU would support a 16x16=32 multiplier. Would it be<br> better to restrict this instruction to 32- and 64-bit and implement the<br> integer division optimization differently for 8- and 16-bit sources?<br></blockquote><div><br></div><div>We don't currently have a mechanism in NIR to restrict an instruction to a particular set of bit-widths. Maybe we should consider adding that?</div><div><br></div><div>As far as the optimization goes, I see a few different options;</div><div><br></div><div> 1) Add a few lines in opt_algebraic to do the lowering. We could even make it unconditional for now.</div><div> 2) Add a couple tiny helpers that switch on bit-width in opt_idiv_const and do the lowering on-the-fly.</div><div> 3) Just disable the pass for 8 and 16-bit for now. It's questionable whether it's actually worth emitting 4-6 instructions for a simple division at those bit sizes anyway.</div><div><br></div><div>I think my order of preference is 3 > 1 > 2. Especially when you consider that we have no good way to test the pass for anything other than 32 and 64-bit right now.</div><div><br></div><div>--Jason<br></div></div></div>