[Mesa-dev] [RFC 00/11] GL_ARB_gpu_shader_fp64

Thu Mar 9 19:49:16 UTC 2017

On 03/03/2017 11:16 AM, Jason Ekstrand wrote:
> Hey Elie!
> 
> On Fri, Mar 3, 2017 at 8:22 AM, Elie Tournier <tournier.elie at gmail.com
> <mailto:tournier.elie at gmail.com>> wrote:
> 
>     From: Elie Tournier <elie.tournier at collabora.com
>     <mailto:elie.tournier at collabora.com>>
> 
>     This series is based on Ian's work about GL_ARB_gpu_shader_int64 [1].
>     The goal is to expose GL_ARB_shader_fp64 to OpenGL 3.0 GPUs.
> 
>     Each function can be independently tested using shader_runner from
>     piglit.
>     The piglit files are stored on github [2].
> 
>     [1]
>     https://lists.freedesktop.org/archives/mesa-dev/2016-November/136718.html
>     <https://lists.freedesktop.org/archives/mesa-dev/2016-November/136718.html>
>     [2] https://github.com/Hopetech/libSoftFloat
>     <https://github.com/Hopetech/libSoftFloat>
> 
> 
> Glad to see this finally turning into code.
> 
> Before, we get too far into things, I'd like to talk about the approach
> a bit.  First off, if we (Intel) are going to use this on any hardware,
> we would really like it to be in NIR.  The reason for this is that NIR
> has a much more powerful algebraic optimizer than GLSL IR and we would
> like to have as few fp64 instructions as possible before we start
> lowering them to piles of integer math.  I believe Ian's plan for this
> was that someone would write a nir_builder back-end for the stand-alone
> compiler.  Unfortunately, he sort-of left that as "an exercise to the
> reader" and no code exists to my knowledge.  If we're going to write
> things in GLSL, we really need that NIR back-end.
> 
> When implemneting int64 (which needs similar lowering) for the Vulkan
> driver, I took the opportunity to try doing it directly in nir_builder
> instead of writing back-end code for the stand-alone compiler.  All in
> all, I'm fairly happy with the result.  You can find my (almost
> finished) branch here:
> 
> https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=wip/nir-int64
> 
> This approach had several advantages:
> 
>  1. The compiler does less work.  Loops can be automatically unrolled,
> you can choose to use select instead of control-flow, it doesn't
> generate functions that have to be inlined, etc.  Now, in GLSL IR, using
> functions may actually be a requirement because it's a tree-based IR and
> adding stuff to the middle of the tree can be tricky.  Also, I'm pretty
> sure they're a requirement for control-flow.  NIR is flat so it's a bit
> nicer in that regard.
> 
>  2. It doesn't require additional compiler infrastructure for converting
> GLSL to compiler code.  We've gone back-and-forth over the years about
> how much is too much codegen.  At one point, the build process built the
> GLSL compiler and used it to compile GLSL to compiler code for the
> built-ins and then built that into the compiler.  The build system for
> doing this was a mess.  The result was that Eric wrote ir_builder and
> all the code was moved over to that.  A quick look at eiether GLSL IR or
> NIR will show you that we haven't completely rejected codegen but one
> always has to ask if it's really the best solution.  Running the
> stand-alone compiler to generate code and then checking it in isn't a
> terrible solution, but it does seem like at it could be a least one too
> many levels of abstraction.
> 
>  3. It's actually less code.  The nir_builder code is approximately 50%
> larger than the GLSL code but, because you don't have to add built-in
> functions and do all of the other plumbing per-opcode, it actually ends
> up being smaller.  Due to the way vectorization is handled (see next
> point), it also involves a lot less infastructure in the lowering pass. 
> Also, it doesn't need 750 lines of standalone compiler code.
> 
>  4. Because I used the "split" pack/unpack opcodes and bcsel instead of
> "if", everything vectorizes automatically.  It turns a i64vec4 iadd, for
> instance, into a bunch of ivec4 operations and kicks out a i32vec4
> result in the end without ever splitting into 4 int64's.  (The one
> exception to this is the if statement in the division lowering which
> required a little special care).  This means that we don't have to carry
> extra code to split all "dvec4" values into 4 "double" values because it
> gets handled by the normal nir_alu_to_scalar pass that we already have. 
> Also, because it uses entirely vector instructions, it can work on an
> entire dvec4 at a time on vec4 hardware (all geometry stages on Intel
> Haswell and earlier).  This should make it about 4x as fast on vec4
> hardware.
> 
> The downside, of course, to writing it nir_builder was that I duplicated
> Ian's GLSL IR pass.  I'm not a fan of duplicating code but, if int64 on

I'd say that writing a low-level representation instead of a high-level
representation is also a disadvantage.  It's really easy for most people
with some C experience to look at and understand GLSL.  Looking at and
understanding NIR builder code requires quite a bit more expertise.
There's a reason we don't write assembly.  For lowering operations that
generate a small set of NIR builder code, this isn't a problem.  For
things like the integer division routine, it's pretty terrible.  There
are similar things in the fp64 code.

Writing GLSL also means that we can automatically get new compiler
features when they are added.  For example, GLSL IR didn't always have a
ir_triop_csel.  There is a bunch of GLSL IR lowering code that manually
generates if-statements, but it could all use csel.  It doesn't because
nobody has gone to modify the generated code by hand.  Frankly, nobody
should have to do that.  That's literally why you have a compiler.
Right? :)  I look at code like do_atan (in builtin_functions.cpp) and
hope that I never have to touch it.

One reason I chose to generate GLSL IR is that we can directly test the
GLSL IR functions on any driver.  There are several of the int64
lowering passes that i965 does not use, and I used this feature to
exercise them during bring up.  It seems like we're not going to do that
in a large scale way, so we can probably cut all of that out.  That
should reduce quite a bit of the plumbing and remove one of the reasons
to use GLSL IR.

The NIR generator should be pretty easy to write.  I'd consider this a
newbie task.  I've been planning to do it, but I just haven't gotten to
it yet.  I'm leaving for my sabbatical next week, so I won't get to it
for a couple months.

I landed the int64 code with the GLSL IR generator because it was all
written.  I didn't want to sit on it for more weeks while a NIR
generator was implemented.  I was not planning to do additional int64
support (for pre-Gen8) until the NIR generator was done.  This code is
also implemented, but I think there is going to be a few rounds of
review and changes before it lands.  I think blocking this on a NIR
generator is okay.

If we're going to use nir_builder and a NIR generator, maybe the right
answer is mix of hand-written and generated code.  Certainly thinks like
64-bit fabs and fneg are trivial to write by hand.  Dunno.

> gen8+ was all I cared about, I think the end result is nice enough that
> I don't really care about the code duplication.  If, on the other hand,
> we're going to have full int64 and fp64 lowering and want to provide
> both in both IR's, then maybe we should reconsider. :-)  It's worth
> noting that, without adding more GLSL built-ins for the split
> pack/unpack opcodes, point 4 above will always be a problem if we use
> GLSL as the base language.
> 
> One solution is to just do it in NIR and tell people that, if they want
> the lowering, they need to support NIR.  Surprisingly, I'm not the one
> who is going to push too hard for this approach.  If we can come up with
> a reasonable way to do it in both, I'm moderately ok with doing so if it
> isn't too much pain.
> 
> Another solution that has come to mind would to be to come up with some
> way to use a carefully chosen set of C/C++ macros that let you write one
> blob of code and compile it as either NIR or GLSL IR builder code. 
> Doing this without creating a mess is going to be difficult.  I've
> thought about a few possible ways to do it but none of them have been
> extraordinarily pretty.  It could look something like
> 
> #if BUILD_NIR
> #define BLD(type, op, ...) nir_##type##op(b, __VA_ARGS__)
> #else
> #define BLD(type, op, ...) op(__VA_ARGS__)
> #endif
> 
> Of course, there are a *lot* of problems with this approach.  One being
> that NIR is typeless while GLSL IR is a typed IR.  Also, NIR is SSA but
> GLSL IR is tree-based with lots of variables.  Between those two, I
> haven't come up with a good idea for how to do a "generic builder"
> without lots of pain.
> 
> Sorry if I haven't provided a lot of answers. :-/  However, I think we
> do want to have this discussion for real before we start landing piles
> more GLSL and codegen'd builder code.
> 
> --Jason Ekstrand