[Mesa-dev] [RFC 00/11] GL_ARB_gpu_shader_fp64

Sat Mar 11 17:50:41 UTC 2017

Ian,

Thank you for responding!  My objective with sending the first e-mail
wasn't to say "NIR or nothing" but to get some healthy debate going and
ensure that all options were explored and evaluated.  Softfp64 is going to
be a lot of code and I want to make sure we're happy with it before we land
it and commit to supporting it going forward.  That is all.

On Thu, Mar 9, 2017 at 11:49 AM, Ian Romanick <idr at freedesktop.org> wrote:

> > The downside, of course, to writing it nir_builder was that I duplicated
> > Ian's GLSL IR pass.  I'm not a fan of duplicating code but, if int64 on
>
> I'd say that writing a low-level representation instead of a high-level
> representation is also a disadvantage.  It's really easy for most people
> with some C experience to look at and understand GLSL.  Looking at and
> understanding NIR builder code requires quite a bit more expertise.
> There's a reason we don't write assembly.  For lowering operations that
> generate a small set of NIR builder code, this isn't a problem.  For
> things like the integer division routine, it's pretty terrible.  There
> are similar things in the fp64 code.
>

There are trade-offs, yes, but I don't think it's nearly as one-sided as
you make it out to be.  I've written thousands of lines of NIR builder code
(lowering passes, BLORP, anv meta back when that was a thing) and have
never found it to be all that bad.  Maybe it really is terrible and I'm
either so immersed in NIR that it all makes sense or just numb to the pain
but I really don't think that's the case.  I've seen a lot of people other
than myself (and who may not even be "compiler people") dive in and work on
it without complaining.  How bad are things like integer division?  I
re-implemented your int64 division algorithm in nir_builder a few weeks ago
just to see how it would go and, while it wasn't great, it also wasn't
horrific.

So, yes, it's a bit more cumbersome to write builder code than writing in
GLSL.  However, it also comes with a lot of power that GLSL doesn't give
you.  I mentioned above a few things such as automatic vectorization (which
is really nifty but looking not terribly useful based on IRC discussions)
and using opcodes that very compiler-friendly but don't have GLSL.  You can
also ensure that loops get unrolled and ifs get lowered the way you want
them.  No, we don't write our driver in assembly, but every libc
implementation implements most of the arithmetic built-ins in assembly (on
platforms people care about) and there are reasons for it.

> Writing GLSL also means that we can automatically get new compiler
> features when they are added.  For example, GLSL IR didn't always have a
> ir_triop_csel.  There is a bunch of GLSL IR lowering code that manually
> generates if-statements, but it could all use csel.  It doesn't because
> nobody has gone to modify the generated code by hand.  Frankly, nobody
> should have to do that.  That's literally why you have a compiler.
> Right? :)

I'm not sure that's actually a feature.  Yes, there are a bunch of
built-ins that generate if's instead of csel.  However, your int64 division
implementation generates either two loops and 5 if statements or 66 if
statements (depending on your loop unrolling settings) and nothing gets
turned into csel.  The reason why you write in assembly is precisely
because you can't always trust the compiler to do the actual optimal
thing.  Yes, our compiler could get better, but some of those problems,
such as the trade-off between being able to skip instructions with an if
vs. the opportunity to schedule things to execute in parallel, can be hard
to solve in general.

> I look at code like do_atan (in builtin_functions.cpp) and
> hope that I never have to touch it.
>

And I hope I never have to debug C++ builder code that's been generated
from GLSL.  Being able to debug the mathematical logic in GLSL using
shader-toy or similar is a great feature.  The moment you have to debug the
generator, however, things start getting painful.  I'm not strictly opposed
to code-gen as you can easily see from how much of it I've personally
checked into the tree.  However, there are always trade-offs and those need
to be evaluated every time.

One reason I chose to generate GLSL IR is that we can directly test the
> GLSL IR functions on any driver.  There are several of the int64
> lowering passes that i965 does not use, and I used this feature to
> exercise them during bring up.  It seems like we're not going to do that
> in a large scale way, so we can probably cut all of that out.  That
> should reduce quite a bit of the plumbing and remove one of the reasons
> to use GLSL IR.
>
> The NIR generator should be pretty easy to write.  I'd consider this a
> newbie task.  I've been planning to do it, but I just haven't gotten to
> it yet.  I'm leaving for my sabbatical next week, so I won't get to it
> for a couple months.
>
> I landed the int64 code with the GLSL IR generator because it was all
> written.  I didn't want to sit on it for more weeks while a NIR
> generator was implemented.  I was not planning to do additional int64
> support (for pre-Gen8) until the NIR generator was done.  This code is
> also implemented, but I think there is going to be a few rounds of
> review and changes before it lands.  I think blocking this on a NIR
> generator is okay.
>

That's fair enough.  And I'm sorry if my comment about the NIR generator
being "left as an exercise to the reader" was a bit of a cheap-shot.
However, I do know that it's going to take more work than the GLSL
generator because you're translating between IRs, and I'm reluctant to
count on unwritten code to solve all my problems.  That doesn't mean I want
to block on it, just that unwritten code has this tendency to stay that way.

As I said at the top, I'm really not going for NIR or nothing.  I agree
that GLSL has advantages for chips like r600 which badly needs emulation
and isn't moving to NIR any time soon.  Also, fp64 isn't a requirement in
Vulkan and, given that Vulkan covers both desktop and mobile, likely won't
be any time soon.  (In fact, if someone wanted Vulkan FP64 on hardware that
didn't support it, I'd be tempted to tell them to pay someone to write a
layer.)  However, *if* we decide that emulated fp64 is better on, for
instance Ivy Bridge, *and* we had customers that cared about it (I don't
know of any), then doing it in NIR could yield substantially better results
(depending on initial shader quality) due to being able to run
nir_opt_algebraic first.  Those are a lot of ifs so maybe I'm suggesting we
design for a non-use-case, but I really don't want to paint ourselves into
a corner that we have to crawl out of 2 years from now.

I'm just asking that we evaluate all available options and choose the one
with the least pain and best maintainability.  If, at the end of that
evaluation, GLSL -> builder is the way to go then I will stand out of the
way and let it happen.  As far as "where do we go next?", what I'd like to
see is some prototypes of at least one of the more painful functions using
GLSL and NIR builders so that we can see how bad the pain actually is.  I'd
also like to see some sort of attempt at figuring out a code-sharing plan
other than codegen.  Sharing the lowering code between the two IRs is
genuinely a really hard problem without something such as your codegen
framework.  If all non-codegen options prove terrible and a nir_builder
generator for the standalone compiler doesn't, then I'll gladly concede. :-)

--Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170311/da8afbe7/attachment.html>