[Mesa-dev] [RFC 00/11] GL_ARB_gpu_shader_fp64

Fri Mar 3 19:16:01 UTC 2017

Hey Elie!

On Fri, Mar 3, 2017 at 8:22 AM, Elie Tournier <tournier.elie at gmail.com>
wrote:

> From: Elie Tournier <elie.tournier at collabora.com>
>
> This series is based on Ian's work about GL_ARB_gpu_shader_int64 [1].
> The goal is to expose GL_ARB_shader_fp64 to OpenGL 3.0 GPUs.
>
> Each function can be independently tested using shader_runner from piglit.
> The piglit files are stored on github [2].
>
> [1] https://lists.freedesktop.org/archives/mesa-dev/2016-
> November/136718.html
> [2] https://github.com/Hopetech/libSoftFloat
>

Glad to see this finally turning into code.

Before, we get too far into things, I'd like to talk about the approach a
bit.  First off, if we (Intel) are going to use this on any hardware, we
would really like it to be in NIR.  The reason for this is that NIR has a
much more powerful algebraic optimizer than GLSL IR and we would like to
have as few fp64 instructions as possible before we start lowering them to
piles of integer math.  I believe Ian's plan for this was that someone
would write a nir_builder back-end for the stand-alone compiler.
Unfortunately, he sort-of left that as "an exercise to the reader" and no
code exists to my knowledge.  If we're going to write things in GLSL, we
really need that NIR back-end.

When implemneting int64 (which needs similar lowering) for the Vulkan
driver, I took the opportunity to try doing it directly in nir_builder
instead of writing back-end code for the stand-alone compiler.  All in all,
I'm fairly happy with the result.  You can find my (almost finished) branch
here:

https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=wip/nir-int64

This approach had several advantages:

 1. The compiler does less work.  Loops can be automatically unrolled, you
can choose to use select instead of control-flow, it doesn't generate
functions that have to be inlined, etc.  Now, in GLSL IR, using functions
may actually be a requirement because it's a tree-based IR and adding stuff
to the middle of the tree can be tricky.  Also, I'm pretty sure they're a
requirement for control-flow.  NIR is flat so it's a bit nicer in that
regard.

 2. It doesn't require additional compiler infrastructure for converting
GLSL to compiler code.  We've gone back-and-forth over the years about how
much is too much codegen.  At one point, the build process built the GLSL
compiler and used it to compile GLSL to compiler code for the built-ins and
then built that into the compiler.  The build system for doing this was a
mess.  The result was that Eric wrote ir_builder and all the code was moved
over to that.  A quick look at eiether GLSL IR or NIR will show you that we
haven't completely rejected codegen but one always has to ask if it's
really the best solution.  Running the stand-alone compiler to generate
code and then checking it in isn't a terrible solution, but it does seem
like at it could be a least one too many levels of abstraction.

 3. It's actually less code.  The nir_builder code is approximately 50%
larger than the GLSL code but, because you don't have to add built-in
functions and do all of the other plumbing per-opcode, it actually ends up
being smaller.  Due to the way vectorization is handled (see next point),
it also involves a lot less infastructure in the lowering pass.  Also, it
doesn't need 750 lines of standalone compiler code.

 4. Because I used the "split" pack/unpack opcodes and bcsel instead of
"if", everything vectorizes automatically.  It turns a i64vec4 iadd, for
instance, into a bunch of ivec4 operations and kicks out a i32vec4 result
in the end without ever splitting into 4 int64's.  (The one exception to
this is the if statement in the division lowering which required a little
special care).  This means that we don't have to carry extra code to split
all "dvec4" values into 4 "double" values because it gets handled by the
normal nir_alu_to_scalar pass that we already have.  Also, because it uses
entirely vector instructions, it can work on an entire dvec4 at a time on
vec4 hardware (all geometry stages on Intel Haswell and earlier).  This
should make it about 4x as fast on vec4 hardware.

The downside, of course, to writing it nir_builder was that I duplicated
Ian's GLSL IR pass.  I'm not a fan of duplicating code but, if int64 on
gen8+ was all I cared about, I think the end result is nice enough that I
don't really care about the code duplication.  If, on the other hand, we're
going to have full int64 and fp64 lowering and want to provide both in both
IR's, then maybe we should reconsider. :-)  It's worth noting that, without
adding more GLSL built-ins for the split pack/unpack opcodes, point 4 above
will always be a problem if we use GLSL as the base language.

One solution is to just do it in NIR and tell people that, if they want the
lowering, they need to support NIR.  Surprisingly, I'm not the one who is
going to push too hard for this approach.  If we can come up with a
reasonable way to do it in both, I'm moderately ok with doing so if it
isn't too much pain.

Another solution that has come to mind would to be to come up with some way
to use a carefully chosen set of C/C++ macros that let you write one blob
of code and compile it as either NIR or GLSL IR builder code.  Doing this
without creating a mess is going to be difficult.  I've thought about a few
possible ways to do it but none of them have been extraordinarily pretty.
It could look something like

#if BUILD_NIR
#define BLD(type, op, ...) nir_##type##op(b, __VA_ARGS__)
#else
#define BLD(type, op, ...) op(__VA_ARGS__)
#endif

Of course, there are a *lot* of problems with this approach.  One being
that NIR is typeless while GLSL IR is a typed IR.  Also, NIR is SSA but
GLSL IR is tree-based with lots of variables.  Between those two, I haven't
come up with a good idea for how to do a "generic builder" without lots of
pain.

Sorry if I haven't provided a lot of answers. :-/  However, I think we do
want to have this discussion for real before we start landing piles more
GLSL and codegen'd builder code.

--Jason Ekstrand
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170303/50e0ae36/attachment.html>