[Lima] [Mesa-dev] NIR constant problem for GPU which doesn't have native integer support
Jason Ekstrand
jason at jlekstrand.net
Sat Jan 5 15:17:33 UTC 2019
Qiang,
Sorry things got so side-tracked....
Going back to the original conversation, I mentioned that there are two
general methods to solve this: Making glsl_to_nir generate int-free NIR and
doing some sort of lowering. I believe Jonathan is attempting the first in
which case the patch you linked to below is what you want. You likely want
more of the patches in his series as well.
If, on the other hand, someone wanted to attempt the lowering approach, I
just sent out something which is my idea of what an after-the-fact
type-assignment pass would look like. Be warned that I haven't even
compiled it much less tested it but it should give some idea of what needs
to be done to figure out the types of SSA values.
--Jason
On Fri, Jan 4, 2019 at 9:03 PM Qiang Yu <yuq825 at gmail.com> wrote:
> Thanks for your guys' information, although it's much deeper than my
> question and also my understanding of the NIR and compiler knowledge.
>
> This patch fix my problem:
> https://patchwork.freedesktop.org/patch/268946/
> But seems it's not the final solution, right? And I'm not sure if we will
> have other integer problem besides constant for non-integer-gpu, do we?
>
> To be honest, as a backend developer, NIR non-integer-gpu support is
> out of my ability. So maybe I have to take this patch temporarily and work
> on other topic for sometime to see if any progress of this support in NIR
> in the future.
>
> Thanks,
> Qiang
>
> On Sat, Jan 5, 2019 at 12:40 AM Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> >
> > On Fri, Jan 4, 2019 at 10:46 AM Jason Ekstrand <jason at jlekstrand.net>
> wrote:
> > >
> > > On Fri, Jan 4, 2019 at 4:07 AM Erik Faye-Lund <
> erik.faye-lund at collabora.com> wrote:
> > >>
> > >> On Thu, 2019-01-03 at 11:58 -0600, Jason Ekstrand wrote:
> > >> > On Thu, Jan 3, 2019 at 3:39 AM Erik Faye-Lund <
> > >> > erik.faye-lund at collabora.com> wrote:
> > >> > > On Wed, 2019-01-02 at 10:16 -0600, Jason Ekstrand wrote:
> > >> > > > On Wed, Jan 2, 2019 at 9:43 AM Ilia Mirkin <
> imirkin at alum.mit.edu>
> > >> > > > wrote:
> > >> > > > > Have a look at the first 4 patches in the series from Jonathan
> > >> > > > > Marek
> > >> > > > > to address some of these issues:
> > >> > > > >
> > >> > > > > https://patchwork.freedesktop.org/series/54295/
> > >> > > > >
> > >> > > > > Not sure exactly what state that work is in, but I've added
> > >> > > > > Jonathan
> > >> > > > > to CC, perhaps he can provide an update.
> > >> > > > >
> > >> > > > > Cheers,
> > >> > > > >
> > >> > > > > -ilia
> > >> > > > >
> > >> > > > > On Wed, Jan 2, 2019 at 6:28 AM Qiang Yu <yuq825 at gmail.com>
> > >> > > wrote:
> > >> > > > > >
> > >> > > > > > Hi guys,
> > >> > > > > >
> > >> > > > > > I found the problem with this test fragment shader when lima
> > >> > > > > development:
> > >> > > > > > uniform int color;
> > >> > > > > > void main() {
> > >> > > > > > if (color > 1)
> > >> > > > > > gl_FragColor = vec4(1.0, 0.0, 0.0, 1);
> > >> > > > > > else
> > >> > > > > > gl_FragColor = vec4(0.0, 1.0, 0.0, 1);
> > >> > > > > > }
> > >> > > > > >
> > >> > > > > > nir_print_shader output:
> > >> > > > > > impl main {
> > >> > > > > > block block_0:
> > >> > > > > > /* preds: */
> > >> > > > > > vec1 32 ssa_0 = load_const (0x00000001 /* 0.000000
> > >> > > */)
> > >> > > > > > vec4 32 ssa_1 = load_const (0x3f800000 /* 1.000000
> > >> > > */,
> > >> > > > > > 0x00000000 /* 0.000000 */, 0x00000000 /* 0.000000 */,
> > >> > > 0x3f800000
> > >> > > > > /*
> > >> > > > > > 1.000000 */)
> > >> > > > > > vec4 32 ssa_2 = load_const (0x00000000 /* 0.000000
> > >> > > */,
> > >> > > > > > 0x3f800000 /* 1.000000 */, 0x00000000 /* 0.000000 */,
> > >> > > 0x3f800000
> > >> > > > > /*
> > >> > > > > > 1.000000 */)
> > >> > > > > > vec1 32 ssa_3 = load_const (0x00000000 /* 0.000000
> > >> > > */)
> > >> > > > > > vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0,
> 1,
> > >> > > 0)
> > >> > > > > /*
> > >> > > > > > base=0 */ /* range=1 */ /* component=0 */ /* color */
> > >> > > > > > vec1 32 ssa_5 = slt ssa_0, ssa_4
> > >> > > > > > vec1 32 ssa_6 = fnot ssa_5
> > >> > > > > > vec4 32 ssa_7 = bcsel ssa_6.xxxx, ssa_2, ssa_1
> > >> > > > > > intrinsic store_output (ssa_7, ssa_3) (0, 15, 0) /*
> > >> > > > > base=0 */
> > >> > > > > > /* wrmask=xyzw */ /* component=0 */ /* gl_FragColor */
> > >> > > > > > /* succs: block_1 */
> > >> > > > > > block block_1:
> > >> > > > > > }
> > >> > > > > >
> > >> > > > > > ssa0 is not converted to float when glsl to nir. I see
> > >> > > > > glsl_to_nir.cpp
> > >> > > > > > will create flt/ilt/ult
> > >> > > > > > based on source type for gpu support native integer, but for
> > >> > > gpu
> > >> > > > > not
> > >> > > > > > support native
> > >> > > > > > integer, just create slt for all source type. And in
> > >> > > > > > nir_lower_constant_initializers,
> > >> > > > > > there's also no type conversion for integer constant.
> > >> > > >
> > >> > > > This is a generally sticky issue. In NIR, we have no concept of
> > >> > > > types on SSA values which has proven perfectly reasonable and
> > >> > > > actually very powerful in a world where integers are supported
> > >> > > > natively. Unfortunately, it causes significant problems for
> > >> > > float-
> > >> > > > only architectures.
> > >> > >
> > >> > > I would like to take this chance to say that this untyped
> SSA-value
> > >> > > choice has lead to issues in both radeon_si (because LLVM values
> > >> > > are
> > >> > > typed) and zink (similarly, because SPIR-V values are typed),
> where
> > >> > > we
> > >> > > need to to bitcasts on every access because there's just not
> enough
> > >> > > information available to emit variables with the right type.
> > >> >
> > >> > I'm not sure if I agree that the two problems are the same or not...
> > >> > More on that in a bit.
> > >> >
> > >> > > It took us a lot of time to realize that the meta-data from the
> > >> > > opcodes
> > >> > > doesn't *really* provide this, because the rest of nir doesn't
> > >> > > treat
> > >> > > values consistently. In fact, this feels arguably more like buggy
> > >> > > behavior; why do we even have fmov when all of the time the
> > >> > > compiler
> > >> > > will emit imovs for floating-point values...? Or why do we have
> > >> > > bitcast
> > >> >
> > >> > Why do we have different mov opcodes? Because they have different
> > >> > behavior in the presence of source/destination modifiers.
> > >>
> > >> Is this general NIR-behavior (i.e will this be honored by constant
> > >> folding etc), or is it Intel specific? If it's NIR-behavior, is it
> > >> documented somewhere?
> > >
> > >
> > > No, constant folding doesn't do modifiers. I had completely forgotten
> about this fact until we started this discussion.
> > >
> > >>
> > >> In either case, I have a feeling that this is a mis-design; the
> > >> modifiers apply to the operands, not the opcodes. It sounds a bit like
> > >> imov shouldn't have been an ALU instruction, but perhaps some other,
> > >> new type. Or perhaps the concept we have should have been a bit finer
> > >> granularity. It's a bit hard to tell without more details, and I can't
> > >> seem to find any on my own.
> > >
> > >
> > > I'm very unclear about what you mean by this remark. Why do you think
> imov is so special? It's the simplest of all ALU ops because it just
> passes through it's value. Source modifiers (if you have them) modify the
> source as per the opcode's stated source type and destination modifiers
> modify the destination as per the opcode's stated destination type. It's
> not that complicated. Also, you don't want and shouldn't have
> source/destination modifiers for a translator like Zink so you shouldn't
> have to care about them. Nothin in NIR will produce them by default; you
> have to explicitly call the pass that adds them.
> > >
> > >>
> > >> Generally, I wish we had some more of the details here written down.
> > >> Implementing support for NIR for a new architecture is currently quite
> > >> tricky, mostly because chasing down the details are hard.
> > >
> > >
> > > Yeah... Patches welcome? There have been many attempts by Connor and
> myself to better document NIR. They all end up in /dev/null due to
> EBIGGERFIRES. :-(
> > >
> > > That said, if you ever want to know how something works, I'm logged
> into IRC 24/7 and will happily answer questions.
> > >
> > >>
> > >> > You likely don't use those in radeon or Zink but we do use them on
> > >> > Intel. That being said, I've very seriously considered adding a
> > >> > generic nir_op_mov which is entirely typeless and doesn't support
> > >> > modifiers at all and make most of core NIR use that. For that
> > >> > matter, there's no real reason why we need fmov with modifiers at
> all
> > >> > when we could we could easily replace "ssa1 = fmov.sat |x|" with
> > >> > "ssa1 = fsat |x|" or "ssa1 = fabs.sat x". If we had a generic
> > >> > nir_op_mov to deal with swizzling etc., the only thing having i/fmov
> > >> > would really accomplish is lowering the number of different ways you
> > >> > can write "fmov.sat |x|". The only reason I haven't added this
> > >> > nir_op_mov opcode is because it's a pile of work and anyone who
> can't
> > >> > do integers can just implement imov as fmov and it's not a big deal.
> > >>
> > >> That sounds reasonable enough, and I understand that this hasn't been
> a
> > >> priority so far.
> > >>
> > >> > > I would really love it if we could at least consider making this
> > >> > > "we
> > >> > > can just treat floats as ints without bitcasts if we feel like
> it"-
> > >> > > design optional for the backend somehow.
> > >> > >
> > >> > > I guess the assumption is that bitcasts are for free? They aren't
> > >> > > once
> > >> > > you have to emit them and have a back-end remove a bunch of
> > >> > > redundant
> > >> > > ones... We should already have all the information to trivially
> > >> > > place
> > >> > > casts for backends that care, yet we currently make it hard
> (unless
> > >> > > your HW/backend happens to be untyped)...
> > >> > >
> > >> > > Is there some way we can perhaps improve this for backends that
> > >> > > care?
> > >> >
> > >> > That's an interesting question...
> > >> >
> > >> > Before answering that question, let's back up and ask why. From a
> > >> > hardware perspective, I know of no graphics hardware that puts types
> > >> > on registers. On Intel, we assign types to sources and destinations
> > >> > of instructions but outside a specific instruction, a register is
> > >> > just a bag of bits. I suspect most other hardware is the same. In
> > >> > the hardware that his particular thread was originally about, you
> > >> > could argue that they have a type and it's always float. The real
> > >> > issue that the GLES2 people are having is that constants have a bag
> > >> > of bits value which may need to be interpreted as integer or float
> > >> > depending on context which they do not have when lowering. They
> > >> > don't care, for instance, about the fact that imov or bcsel take
> > >> > integers because they know that the sources will either be floats or
> > >> > integers that they've lowered to floats and so they know that the
> > >> > result will be a float.
> > >> >
> > >> > The problem you're describing is in converting from NIR to another
> > >> > IR, not to hardware.
> > >>
> > >> Yes; NIR serves *both* as an LLIR *and* as a HIR. Your analysis above
> > >> seems to only make sense when it's used as an LLIR, though.
> > >
> > >
> > > NIR is not and never has claimed to be a HIR. lossless GLSL -> NIR ->
> GLSL translation has never been a goal of NIR. I'm sorry if that sounds
> like a fairly dogmatic and abrupt claim; I'll explain what I mean a bit
> lower down.
> > >
> > >>
> > >> What I'm trying to say, is that since we *have* this information when
> > >> moving from GLSL, perhaps we could find a way of preserving it so a
> > >> backend that needs it could use it without having to recompute it.
> > >> Which turns out to be a lot trickier than it sounds, more on this
> > >> later.
> > >
> > >
> > > I think the best response to this is to give you a bit of this
> wonderful and useful GLSL:
> > >
> > > r4.y = intBitsToFloat(float(0.00000000f) <
> idx_uniforms4_cs.cb4[10 + floatBitsToInt(r5.x)].w ? int(0xffffffff) :
> int(0x00000000));
> > > r4.z = intBitsToFloat(idx_uniforms4_cs.cb4[10 +
> floatBitsToInt(r5.x)].w < float(0.00000000f) ? int(0xffffffff) :
> int(0x00000000));
> > > r4.y = intBitsToFloat(-floatBitsToInt(r4.y) +
> floatBitsToInt(r4.z));
> > > r4.y = float(floatBitsToInt(r4.y));
> > > r4.z = float(0.00999999978f) * idx_uniforms4_cs.cb4[10
> + floatBitsToInt(r5.x)].w;
> > > r4.w = float(1.00000000f) + -idx_uniforms1_cs.cb1[32 +
> floatBitsToInt(r2.w)].y;
> > >
> > > That is an excerpt from an actual shader in a fairly popular steam
> tilte. Looking through our shader database, I can find at least a dozen
> games where every line of every single shader looks like that.
> > >
> > > What's my point with this, admittedly somewhat tongue-in-cheek,
> example? Three things:
> > >
> > > 1. The information in GLSL which you claim to want in the back-end is
> only really useful in applications which have been written in GLSL from the
> start. It works great for TuxRacer, piglit/CTS tests, and even some real
> games. However, for applications that come from translators (especially
> D3D10/11), types on variables are all lies and should be ignored. If you
> read the above carefully, you can see that they use r4.zw as a float and
> r4.y as both a float and an int.
> > >
> > > 2. NIR will eat this shader for breakfast and produce something very
> sensible precisely because it doesn't care about types. In this e-mail
> thread, I feel like you've been treating the lack of full types like some
> accidental omission. It was an intentional feature precisely because of
> this kind of garbage that we have to deal with.
> > >
> > > 3. Transpilers do all sorts of nasty things to turn some (possibly
> low-level) thing into another (possibly high-level) thing. Zink isn't
> hardware. It is, effectively, an emulator and/or translation layer and you
> should expect some of this sort of horror.
> > >
> > >>
> > >> > In LLVM they made a choice to put types on SSA values but then to
> > >> > have the actual semantics be based on the instruction itself. This
> > >> > leads to lots of redundancy in the IR but also lots of things you
> can
> > >> > validate which is kind-of nice. Is that redundancy really useful?
> > >> > In our experience with NIR, we haven't found it to be other than
> > >> > booleans (now sorted), this one constant issue, and translating to
> > >> > IRs that have that redundancy. Then why did they add it? I'm
> really
> > >> > not sure but it may, at least somewhat, be related to the fact that
> > >> > they allow arrays and structs in their SSA values and so need full
> > >> > types. This is another design decision in LLVM which I find highly
> > >> > questionable. What you're is more-or-less that NIR should carry,
> > >> > maintain, and validate extra useless information just so we can pass
> > >> > it on to LLVM which is going to use it for what, exactly? Sorry if
> > >> > I'm extremely reluctant to make fairly fundamental changes to NIR
> > >> > with no better reason than "LLVM did it that way".
> > >> >
> > >> > There's a second reason why I don't like the idea of putting types
> on
> > >> > SSA values: It's extremely deceptive. In SPIR-V you're allowed to
> do
> > >> > an OpSConvert with a source that is an unsigned 16-bit integer and a
> > >> > destination that is an unsigned 32-bit integer. What happens? Your
> > >> > uint -> uint cast sign extends! Yup.... That's what happens. No
> > >> > joke. The same is true of signed vs. unsigned division or modulus.
> > >> > The end result is that the types in SPIR-V are useless and you can't
> > >> > actually trust them for anything except bit-size and sometimes
> > >> > labelling something as a float vs. int.
> > >>
> > >> Right. I believe it's already been pointed out in this thread that it
> > >> might actually be useful to differentiate between integers and float
> > >> values, at least for some CPU architectures. For signed vs unsigned,
> > >> it'd be no problem for zink to treat these as the same.
> > >
> > >
> > > As was also pointed out in that thread, LLVM has been moving away from
> using types to decide registers on x86 and similar platforms and towards
> just looking at the opcodes because the types overly restrict the SSE
> opcodes they can use for various things such as shuffles.
> > >
> > >>
> > >> In my use-case, it's not because *I* think it's perticularly useful in
> > >> itself, but because it's a requirement for generating valid SPIR-V. I
> > >> guess SPIR-V inherited this from LLVM?
> > >
> > >
> > > And here inlies the crux of the problem you're trying to solve. Zink
> is not hardware, it's a translation layer. You're hitting the same set of
> issues as virgil, vmware, DXVK, dolphin, and any other
> translator/emulator. In particular, you're having to jump from a low-level
> thing to a high-level thing and, oops, we lost some information. That
> information isn't actually needed/useful as the driver Zink is running on
> top of is going to translate it back into NIR and throw the types away
> again. However, we have to have them in order to make the high-level
> language happy.
> > >
> > > I'm not trying to belittle the problem you're having in any way. I'm
> simply pointing out that it's a translation/emulation problem and not a
> problem for actual hardware drivers. Most hardware has some notion of
> types but those types are on sources/destinations and not registers and
> they exist purely to reduce the combinatorial explosion of opcodes. Yes,
> it's a bit of an issue on radeon thanks to LLVM but Bas seemed happy enough
> with the status quo in his reply so I don't think it's causing them too
> much grief.
> > >
> > >>
> > >> I'm not particularly worried about adding a couple of bits (or perhaps
> > >> even one?) per nir_ssa_def (what you seem to call "faily fundamental
> > >> changes to NIR" above), if that helps here.
> > >
> > >
> > > It's not adding a couple bits to a data structure that concerns me,
> it's maintenance. As I pointed out with the shader example above, NIR is
> extremely good at chewing through DXBC -> GLSL translated garbage and part
> of that power comes from ignoring types. As such, I don't want NIR to
> suddenly have to start respecting them in any reasonable sense. When we
> mutate the IR to perform optimizations, we don't want respecting types to
> tie us down. We could potentially continue to mostly ignore them and then
> have some very clever helpers that figure out what type we need at the end
> and stick it on but it's not at all clear to me what that would look like.
> > >
> > > If you think you can make it work in a non-intrusive and unhindering
> way, go for it. However, as someone who knows NIR inside and out and who
> has done quite a bit of deep NIR surgery recently, I don't think you have
> any idea what you're in for. :-) These types of changes are very
> difficult and time-consuming to get right and what you're suggesting is, in
> my view, the deepest change anyone has ever made to the IR post-landing.
> Am I trying to scare you away? Maybe a little bit but mostly I'm just
> being entirely honest.
> > >
> > >>
> > >> > So how do we deal with the mismatch? My recommendation would be to
> > >> > use the source/destination types of the ALU ops where needed/useful,
> > >> > add special cases for things like imov and bcsel where you can pass
> > >> > the source type through, and insert extra bitcasts whenever they're
> > >> > needed. This is obvious enough that I'm going to guess you're
> > >> > already more-or-less doing exactly that.
> > >>
> > >> We tried this approach for a while, but we couldn't get it to work.
> > >> There were too many situations where we couldn't know the type early
> > >> enough to avoid crawling through piles of old instructions to figure
> it
> > >> out, for instance when reading a constant value (untyped),
> constructing
> > >> some vector with it (still untyped). It's not until we read it that we
> > >> know what type we needed.
> > >>
> > >> In fact, vectors itself cause major headache here, because they need
> to
> > >> have consistent types in it, and NIR can easily let you create a
> vector
> > >> with floats and ints in it, side by side. Perhaps that could be
> > >> considered a feature... I wouldn't, at least not on a HIR level. Once
> > >> the code hits a backend, perhaps.
> > >
> > >
> > > You're writing a back-end. :-P Sorry, snarky again... Must be a
> first-email-of-the-day problem...
> > >
> > > I'm not at all surprised that assigning types after-the-fact involves
> lots of crawling. However, as you can see from the example above, stashing
> ints and floats in the same vector is a real thing that applications
> actually do. In the end, then, we have three options:
> > >
> > > 1. Status quo: Ignore types while optimizing and figure out types
> after-the-fact
> > >
> > > 2. Put in types and enforce them using bitcast opcodes where needed.
> Optimizations which would break established types wouldn't be allowed.
> > >
> > > 3. Put in types but make them mostly ignorable by optimizations and
> do something clever to make everything consistent.
> > >
> > > I am very strongly against option 2 for all of the reasons I've
> already said. I don't see a real path to option 3 that makes any sense to
> me. I'm not saying we can't get there but I fear that the exact same set
> of problems you're experiencing trying to translate from NIR to SPIR-V
> would have to be scattered throughout the entire IR. That doesn't mean
> that it can't be done or that I'll NAK a successful solution. I just don't
> see a successful solution.
> > >
> > > Assuming I'm not missing the obvious, that leaves us back at 1. Is it
> really so bad? I completely agree that it involves lots of chasing but I
> don't see a way around that. Perhaps part of why it's so painful is that
> you're doing it on-the-fly? Would it help if we had a nir_assign_ssa_types
> pass that kicks out a bitset which says which things are floats? Then the
> only thing your NIR -> SPIR-V pass would have to concern itself with is
> on-the-fly adding bitcasts here-and-there to respect the pre-assigned
> types. That doesn't sound so bad.
> > >
> > > How would such a pass work? I can immagine a fiew different
> implementations. One would assume everything is an integer and walk the IR
> backwards assigning things "float" status based on their variable type (for
> loads/stores), ALU type (most ALU ops), or the float status of the SSA
> def's uses. To handle phis, you'd have to do some sort of fixed-point
> algorithm like we do for dataflow analysis. It'd be a bit fiddly but it
> doesn't sound terrible. And, yes, it's a heuristic but, with shaders like
> the pristine example I gave above, that's the best you'll ever be able to
> do.
> > >
> > > Wow, that got long. :-/
> >
> > You know, after being in such disagreement over totally unrelated
> > issues, it's refreshing to be in total agreement again. I was going to
> > write a long email this morning about how shaders to terrible things
> > with intBitsToFloat and back again, but you appear to have beat me to
> > it... again. The moral of that story is that people will write the
> > code that they want to write and tell the compiler to STFU in whatever
> > way possible. That's the reality we have to live with.
> >
> > The view that TGSI takes (and DX10/DX11 AFAIK, hence all the dirty
> > shaders translated to GLSL) is that values are bags of bits. They
> > receive meaning from the instruction. fadd treats it as a bag of float
> > bits. iadd treats it as a bag of int bits. While I was not involved in
> > the creation of TGSI nor nv50_ir, the latter does associate a type
> > with its values. As I've been maintaining nv50_ir, this type
> > information has been nothing but trouble -- it's basically always
> > wrong. Let's say you have a sequence like
> >
> > int i = 1
> > float y = fmul x, i
> >
> > you might be tempted to think that because the immediate has an
> > integer type, and its value is 1, that you can just eliminate this
> > multiply. But you can't. You always have to consider what the
> > operation is doing, and how it will interpret the bits that are given
> > to it. And if you keep the bitcasts in place, then you lose a lot of
> > opportunities to simplify conditionals/etc. In the end, I think I've
> > eliminated almost all instances of looking at the specific type of a
> > value, except for its bitsize.
> >
> > I think the solution for you to just give up and do what the DX ->
> > GLSL translators do -- make everything an int, and add tons of
> > bitcasts -- they get ignored by the backends anyways.
> >
> > One shortcoming of the "value is a bag of bits" approach is that it
> > complicates things like value-range analysis. Types definitely have a
> > place in this world, but they have to be determined empirically from
> > how the values are used, not from what it says in the language.
> >
> > This has all strayed considerably from the original topic of
> > non-integer-gpu support in nir, which I think should definitely be
> > done, and can be relatively straightforward. We definitely had no
> > trouble with it in the GLSL -> TGSI translator.
> >
> > Cheers,
> >
> > -ilia
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/lima/attachments/20190105/03c0daeb/attachment-0001.html>
More information about the lima
mailing list