[Mesa-dev] [RFC 00/11] GL_ARB_gpu_shader_fp64

Fri Mar 3 19:23:12 UTC 2017

One other comment.  I'm not sure if you've seen it but, if you haven't, you
should check out what Connor and the Igalia guys already did for NIR:

https://cgit.freedesktop.org/mesa/mesa/tree/src/compiler/nir/nir_lower_double_ops.c

It's not full soft-float but there's some very nice algorithms in there for
things such as rcp.

On Fri, Mar 3, 2017 at 11:16 AM, Jason Ekstrand <jason at jlekstrand.net>
wrote:

> Hey Elie!
>
> On Fri, Mar 3, 2017 at 8:22 AM, Elie Tournier <tournier.elie at gmail.com>
> wrote:
>
>> From: Elie Tournier <elie.tournier at collabora.com>
>>
>> This series is based on Ian's work about GL_ARB_gpu_shader_int64 [1].
>> The goal is to expose GL_ARB_shader_fp64 to OpenGL 3.0 GPUs.
>>
>> Each function can be independently tested using shader_runner from piglit.
>> The piglit files are stored on github [2].
>>
>> [1] https://lists.freedesktop.org/archives/mesa-dev/2016-Novembe
>> r/136718.html
>> [2] https://github.com/Hopetech/libSoftFloat
>>
>
> Glad to see this finally turning into code.
>
> Before, we get too far into things, I'd like to talk about the approach a
> bit.  First off, if we (Intel) are going to use this on any hardware, we
> would really like it to be in NIR.  The reason for this is that NIR has a
> much more powerful algebraic optimizer than GLSL IR and we would like to
> have as few fp64 instructions as possible before we start lowering them to
> piles of integer math.  I believe Ian's plan for this was that someone
> would write a nir_builder back-end for the stand-alone compiler.
> Unfortunately, he sort-of left that as "an exercise to the reader" and no
> code exists to my knowledge.  If we're going to write things in GLSL, we
> really need that NIR back-end.
>
> When implemneting int64 (which needs similar lowering) for the Vulkan
> driver, I took the opportunity to try doing it directly in nir_builder
> instead of writing back-end code for the stand-alone compiler.  All in all,
> I'm fairly happy with the result.  You can find my (almost finished) branch
> here:
>
> https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=wip/nir-int64
>
> This approach had several advantages:
>
>  1. The compiler does less work.  Loops can be automatically unrolled, you
> can choose to use select instead of control-flow, it doesn't generate
> functions that have to be inlined, etc.  Now, in GLSL IR, using functions
> may actually be a requirement because it's a tree-based IR and adding stuff
> to the middle of the tree can be tricky.  Also, I'm pretty sure they're a
> requirement for control-flow.  NIR is flat so it's a bit nicer in that
> regard.
>
>  2. It doesn't require additional compiler infrastructure for converting
> GLSL to compiler code.  We've gone back-and-forth over the years about how
> much is too much codegen.  At one point, the build process built the GLSL
> compiler and used it to compile GLSL to compiler code for the built-ins and
> then built that into the compiler.  The build system for doing this was a
> mess.  The result was that Eric wrote ir_builder and all the code was moved
> over to that.  A quick look at eiether GLSL IR or NIR will show you that we
> haven't completely rejected codegen but one always has to ask if it's
> really the best solution.  Running the stand-alone compiler to generate
> code and then checking it in isn't a terrible solution, but it does seem
> like at it could be a least one too many levels of abstraction.
>
>  3. It's actually less code.  The nir_builder code is approximately 50%
> larger than the GLSL code but, because you don't have to add built-in
> functions and do all of the other plumbing per-opcode, it actually ends up
> being smaller.  Due to the way vectorization is handled (see next point),
> it also involves a lot less infastructure in the lowering pass.  Also, it
> doesn't need 750 lines of standalone compiler code.
>
>  4. Because I used the "split" pack/unpack opcodes and bcsel instead of
> "if", everything vectorizes automatically.  It turns a i64vec4 iadd, for
> instance, into a bunch of ivec4 operations and kicks out a i32vec4 result
> in the end without ever splitting into 4 int64's.  (The one exception to
> this is the if statement in the division lowering which required a little
> special care).  This means that we don't have to carry extra code to split
> all "dvec4" values into 4 "double" values because it gets handled by the
> normal nir_alu_to_scalar pass that we already have.  Also, because it uses
> entirely vector instructions, it can work on an entire dvec4 at a time on
> vec4 hardware (all geometry stages on Intel Haswell and earlier).  This
> should make it about 4x as fast on vec4 hardware.
>
> The downside, of course, to writing it nir_builder was that I duplicated
> Ian's GLSL IR pass.  I'm not a fan of duplicating code but, if int64 on
> gen8+ was all I cared about, I think the end result is nice enough that I
> don't really care about the code duplication.  If, on the other hand, we're
> going to have full int64 and fp64 lowering and want to provide both in both
> IR's, then maybe we should reconsider. :-)  It's worth noting that, without
> adding more GLSL built-ins for the split pack/unpack opcodes, point 4 above
> will always be a problem if we use GLSL as the base language.
>
> One solution is to just do it in NIR and tell people that, if they want
> the lowering, they need to support NIR.  Surprisingly, I'm not the one who
> is going to push too hard for this approach.  If we can come up with a
> reasonable way to do it in both, I'm moderately ok with doing so if it
> isn't too much pain.
>
> Another solution that has come to mind would to be to come up with some
> way to use a carefully chosen set of C/C++ macros that let you write one
> blob of code and compile it as either NIR or GLSL IR builder code.  Doing
> this without creating a mess is going to be difficult.  I've thought about
> a few possible ways to do it but none of them have been extraordinarily
> pretty.  It could look something like
>
> #if BUILD_NIR
> #define BLD(type, op, ...) nir_##type##op(b, __VA_ARGS__)
> #else
> #define BLD(type, op, ...) op(__VA_ARGS__)
> #endif
>
> Of course, there are a *lot* of problems with this approach.  One being
> that NIR is typeless while GLSL IR is a typed IR.  Also, NIR is SSA but
> GLSL IR is tree-based with lots of variables.  Between those two, I haven't
> come up with a good idea for how to do a "generic builder" without lots of
> pain.
>
> Sorry if I haven't provided a lot of answers. :-/  However, I think we do
> want to have this discussion for real before we start landing piles more
> GLSL and codegen'd builder code.
>
> --Jason Ekstrand
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170303/8f96f824/attachment-0001.html>