<div dir="ltr"><div>One other comment. I'm not sure if you've seen it but, if you haven't, you should check out what Connor and the Igalia guys already did for NIR:<br><br><a href="https://cgit.freedesktop.org/mesa/mesa/tree/src/compiler/nir/nir_lower_double_ops.c">https://cgit.freedesktop.org/mesa/mesa/tree/src/compiler/nir/nir_lower_double_ops.c</a><br><br></div>It's not full soft-float but there's some very nice algorithms in there for things such as rcp.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Mar 3, 2017 at 11:16 AM, Jason Ekstrand <span dir="ltr"><<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">Hey Elie!<br></div><div class="gmail_quote"><span class=""><br>On Fri, Mar 3, 2017 at 8:22 AM, Elie Tournier <span dir="ltr"><<a href="mailto:tournier.elie@gmail.com" target="_blank">tournier.elie@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">From: Elie Tournier <<a href="mailto:elie.tournier@collabora.com" target="_blank">elie.tournier@collabora.com</a>><br> <br> This series is based on Ian's work about GL_ARB_gpu_shader_int64 [1].<br> The goal is to expose GL_ARB_shader_fp64 to OpenGL 3.0 GPUs.<br> <br> Each function can be independently tested using shader_runner from piglit.<br> The piglit files are stored on github [2].<br> <br> [1] <a href="https://lists.freedesktop.org/archives/mesa-dev/2016-November/136718.html" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>archives/mesa-dev/2016-Novembe<wbr>r/136718.html</a><br> [2] <a href="https://github.com/Hopetech/libSoftFloat" rel="noreferrer" target="_blank">https://github.com/Hopetech/li<wbr>bSoftFloat</a><br></blockquote><div><br></div></span><div>Glad to see this finally turning into code.<br><br>Before, we get too far into things, I'd like to talk about the approach a bit. First off, if we (Intel) are going to use this on any hardware, we would really like it to be in NIR. The reason for this is that NIR has a much more powerful algebraic optimizer than GLSL IR and we would like to have as few fp64 instructions as possible before we start lowering them to piles of integer math. I believe Ian's plan for this was that someone would write a nir_builder back-end for the stand-alone compiler. Unfortunately, he sort-of left that as "an exercise to the reader" and no code exists to my knowledge. If we're going to write things in GLSL, we really need that NIR back-end.<br><br>When implemneting int64 (which needs similar lowering) for the Vulkan driver, I took the opportunity to try doing it directly in nir_builder instead of writing back-end code for the stand-alone compiler. All in all, I'm fairly happy with the result. You can find my (almost finished) branch here:<br><br><a href="https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=wip/nir-int64" target="_blank">https://cgit.freedesktop.org/~<wbr>jekstrand/mesa/log/?h=wip/nir-<wbr>int64</a><br><br>This approach had several advantages:<br><br></div><div> 1. The compiler does less work. Loops can be automatically unrolled, you can choose to use select instead of control-flow, it doesn't generate functions that have to be inlined, etc. Now, in GLSL IR, using functions may actually be a requirement because it's a tree-based IR and adding stuff to the middle of the tree can be tricky. Also, I'm pretty sure they're a requirement for control-flow. NIR is flat so it's a bit nicer in that regard.<br><br></div><div> 2. It doesn't require additional compiler infrastructure for converting GLSL to compiler code. We've gone back-and-forth over the years about how much is too much codegen. At one point, the build process built the GLSL compiler and used it to compile GLSL to compiler code for the built-ins and then built that into the compiler. The build system for doing this was a mess. The result was that Eric wrote ir_builder and all the code was moved over to that. A quick look at eiether GLSL IR or NIR will show you that we haven't completely rejected codegen but one always has to ask if it's really the best solution. Running the stand-alone compiler to generate code and then checking it in isn't a terrible solution, but it does seem like at it could be a least one too many levels of abstraction.<br></div><div><br></div><div> 3. It's actually less code. The nir_builder code is approximately 50% larger than the GLSL code but, because you don't have to add built-in functions and do all of the other plumbing per-opcode, it actually ends up being smaller. Due to the way vectorization is handled (see next point), it also involves a lot less infastructure in the lowering pass. Also, it doesn't need 750 lines of standalone compiler code.<br><br></div><div> 4. Because I used the "split" pack/unpack opcodes and bcsel instead of "if", everything vectorizes automatically. It turns a i64vec4 iadd, for instance, into a bunch of ivec4 operations and kicks out a i32vec4 result in the end without ever splitting into 4 int64's. (The one exception to this is the if statement in the division lowering which required a little special care). This means that we don't have to carry extra code to split all "dvec4" values into 4 "double" values because it gets handled by the normal nir_alu_to_scalar pass that we already have. Also, because it uses entirely vector instructions, it can work on an entire dvec4 at a time on vec4 hardware (all geometry stages on Intel Haswell and earlier). This should make it about 4x as fast on vec4 hardware.<br></div><div><br></div><div>The downside, of course, to writing it nir_builder was that I duplicated Ian's GLSL IR pass. I'm not a fan of duplicating code but, if int64 on gen8+ was all I cared about, I think the end result is nice enough that I don't really care about the code duplication. If, on the other hand, we're going to have full int64 and fp64 lowering and want to provide both in both IR's, then maybe we should reconsider. :-) It's worth noting that, without adding more GLSL built-ins for the split pack/unpack opcodes, point 4 above will always be a problem if we use GLSL as the base language.<br><br></div><div>One solution is to just do it in NIR and tell people that, if they want the lowering, they need to support NIR. Surprisingly, I'm not the one who is going to push too hard for this approach. If we can come up with a reasonable way to do it in both, I'm moderately ok with doing so if it isn't too much pain.<br><br></div><div>Another solution that has come to mind would to be to come up with some way to use a carefully chosen set of C/C++ macros that let you write one blob of code and compile it as either NIR or GLSL IR builder code. Doing this without creating a mess is going to be difficult. I've thought about a few possible ways to do it but none of them have been extraordinarily pretty. It could look something like<br><br></div><div>#if BUILD_NIR<br></div><div>#define BLD(type, op, ...) nir_##type##op(b, __VA_ARGS__)<br></div><div>#else<br></div><div>#define BLD(type, op, ...) op(__VA_ARGS__)<br></div><div>#endif<br><br></div><div>Of course, there are a *lot* of problems with this approach. One being that NIR is typeless while GLSL IR is a typed IR. Also, NIR is SSA but GLSL IR is tree-based with lots of variables. Between those two, I haven't come up with a good idea for how to do a "generic builder" without lots of pain.<br><br></div><div>Sorry if I haven't provided a lot of answers. :-/ However, I think we do want to have this discussion for real before we start landing piles more GLSL and codegen'd builder code.<span class="HOEnZb"><font color="#888888"><br><br></font></span></div><span class="HOEnZb"><font color="#888888"><div>--Jason Ekstrand<br></div></font></span></div></div></div> </blockquote></div><br></div>