[Lima] [Mesa-dev] NIR constant problem for GPU which doesn't have native integer support
Jason Ekstrand
jason at jlekstrand.net
Thu Jan 3 20:45:34 UTC 2019
On Thu, Jan 3, 2019 at 2:03 PM Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
wrote:
> On Thu, Jan 3, 2019 at 6:59 PM Jason Ekstrand <jason at jlekstrand.net>
> wrote:
> >
> > On Thu, Jan 3, 2019 at 3:39 AM Erik Faye-Lund <
> erik.faye-lund at collabora.com> wrote:
> >>
> >> On Wed, 2019-01-02 at 10:16 -0600, Jason Ekstrand wrote:
> >> > On Wed, Jan 2, 2019 at 9:43 AM Ilia Mirkin <imirkin at alum.mit.edu>
> >> > wrote:
> >> > > Have a look at the first 4 patches in the series from Jonathan
> >> > > Marek
> >> > > to address some of these issues:
> >> > >
> >> > > https://patchwork.freedesktop.org/series/54295/
> >> > >
> >> > > Not sure exactly what state that work is in, but I've added
> >> > > Jonathan
> >> > > to CC, perhaps he can provide an update.
> >> > >
> >> > > Cheers,
> >> > >
> >> > > -ilia
> >> > >
> >> > > On Wed, Jan 2, 2019 at 6:28 AM Qiang Yu <yuq825 at gmail.com> wrote:
> >> > > >
> >> > > > Hi guys,
> >> > > >
> >> > > > I found the problem with this test fragment shader when lima
> >> > > development:
> >> > > > uniform int color;
> >> > > > void main() {
> >> > > > if (color > 1)
> >> > > > gl_FragColor = vec4(1.0, 0.0, 0.0, 1);
> >> > > > else
> >> > > > gl_FragColor = vec4(0.0, 1.0, 0.0, 1);
> >> > > > }
> >> > > >
> >> > > > nir_print_shader output:
> >> > > > impl main {
> >> > > > block block_0:
> >> > > > /* preds: */
> >> > > > vec1 32 ssa_0 = load_const (0x00000001 /* 0.000000 */)
> >> > > > vec4 32 ssa_1 = load_const (0x3f800000 /* 1.000000 */,
> >> > > > 0x00000000 /* 0.000000 */, 0x00000000 /* 0.000000 */, 0x3f800000
> >> > > /*
> >> > > > 1.000000 */)
> >> > > > vec4 32 ssa_2 = load_const (0x00000000 /* 0.000000 */,
> >> > > > 0x3f800000 /* 1.000000 */, 0x00000000 /* 0.000000 */, 0x3f800000
> >> > > /*
> >> > > > 1.000000 */)
> >> > > > vec1 32 ssa_3 = load_const (0x00000000 /* 0.000000 */)
> >> > > > vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 1, 0)
> >> > > /*
> >> > > > base=0 */ /* range=1 */ /* component=0 */ /* color */
> >> > > > vec1 32 ssa_5 = slt ssa_0, ssa_4
> >> > > > vec1 32 ssa_6 = fnot ssa_5
> >> > > > vec4 32 ssa_7 = bcsel ssa_6.xxxx, ssa_2, ssa_1
> >> > > > intrinsic store_output (ssa_7, ssa_3) (0, 15, 0) /*
> >> > > base=0 */
> >> > > > /* wrmask=xyzw */ /* component=0 */ /* gl_FragColor */
> >> > > > /* succs: block_1 */
> >> > > > block block_1:
> >> > > > }
> >> > > >
> >> > > > ssa0 is not converted to float when glsl to nir. I see
> >> > > glsl_to_nir.cpp
> >> > > > will create flt/ilt/ult
> >> > > > based on source type for gpu support native integer, but for gpu
> >> > > not
> >> > > > support native
> >> > > > integer, just create slt for all source type. And in
> >> > > > nir_lower_constant_initializers,
> >> > > > there's also no type conversion for integer constant.
> >> >
> >> > This is a generally sticky issue. In NIR, we have no concept of
> >> > types on SSA values which has proven perfectly reasonable and
> >> > actually very powerful in a world where integers are supported
> >> > natively. Unfortunately, it causes significant problems for float-
> >> > only architectures.
> >>
> >> I would like to take this chance to say that this untyped SSA-value
> >> choice has lead to issues in both radeon_si (because LLVM values are
> >> typed) and zink (similarly, because SPIR-V values are typed), where we
> >> need to to bitcasts on every access because there's just not enough
> >> information available to emit variables with the right type.
> >
> >
> > I'm not sure if I agree that the two problems are the same or not...
> More on that in a bit.
> >
> >>
> >> It took us a lot of time to realize that the meta-data from the opcodes
> >> doesn't *really* provide this, because the rest of nir doesn't treat
> >> values consistently. In fact, this feels arguably more like buggy
> >> behavior; why do we even have fmov when all of the time the compiler
> >> will emit imovs for floating-point values...? Or why do we have bitcast
> >
> >
> > Why do we have different mov opcodes? Because they have different
> behavior in the presence of source/destination modifiers. You likely don't
> use those in radeon or Zink but we do use them on Intel. That being said,
> I've very seriously considered adding a generic nir_op_mov which is
> entirely typeless and doesn't support modifiers at all and make most of
> core NIR use that. For that matter, there's no real reason why we need
> fmov with modifiers at all when we could we could easily replace "ssa1 =
> fmov.sat |x|" with "ssa1 = fsat |x|" or "ssa1 = fabs.sat x". If we had a
> generic nir_op_mov to deal with swizzling etc., the only thing having
> i/fmov would really accomplish is lowering the number of different ways you
> can write "fmov.sat |x|". The only reason I haven't added this nir_op_mov
> opcode is because it's a pile of work and anyone who can't do integers can
> just implement imov as fmov and it's not a big deal.
>
> Is there anything blocking just always using fmov and removing imov
> from nir? Currently anyone that want to add a modifier to an imov can
> just change the opcode to an fmov and add the modifier, is it not?
>
NIR (and intel HW) has both integer and float modifiers. modifiers on fmov
and imov do different things. I'm not really sure what you're suggesting
here.
> >>
> >> I would really love it if we could at least consider making this "we
> >> can just treat floats as ints without bitcasts if we feel like it"-
> >> design optional for the backend somehow.
> >>
> >> I guess the assumption is that bitcasts are for free? They aren't once
> >> you have to emit them and have a back-end remove a bunch of redundant
> >> ones... We should already have all the information to trivially place
> >> casts for backends that care, yet we currently make it hard (unless
> >> your HW/backend happens to be untyped)...
> >>
> >> Is there some way we can perhaps improve this for backends that care?
> >
> >
> > That's an interesting question...
> >
> > Before answering that question, let's back up and ask why. From a
> hardware perspective, I know of no graphics hardware that puts types on
> registers. On Intel, we assign types to sources and destinations of
> instructions but outside a specific instruction, a register is just a bag
> of bits. I suspect most other hardware is the same. In the hardware that
> his particular thread was originally about, you could argue that they have
> a type and it's always float. The real issue that the GLES2 people are
> having is that constants have a bag of bits value which may need to be
> interpreted as integer or float depending on context which they do not have
> when lowering. They don't care, for instance, about the fact that imov or
> bcsel take integers because they know that the sources will either be
> floats or integers that they've lowered to floats and so they know that the
> result will be a float.
> >
> > The problem you're describing is in converting from NIR to another IR,
> not to hardware. In LLVM they made a choice to put types on SSA values but
> then to have the actual semantics be based on the instruction itself. This
> leads to lots of redundancy in the IR but also lots of things you can
> validate which is kind-of nice. Is that redundancy really useful? In our
> experience with NIR, we haven't found it to be other than booleans (now
> sorted), this one constant issue, and translating to IRs that have that
> redundancy. Then why did they add it? I'm really not sure but it may, at
> least somewhat, be related to the fact that they allow arrays and structs
> in their SSA values and so need full types. This is another design
> decision in LLVM which I find highly questionable. What you're is
> more-or-less that NIR should carry, maintain, and validate extra useless
> information just so we can pass it on to LLVM which is going to use it for
> what, exactly? Sorry if I'm extremely reluctant to make fairly fundamental
> changes to NIR with no better reason than "LLVM did it that way".
> >
> > There's a second reason why I don't like the idea of putting types on
> SSA values: It's extremely deceptive. In SPIR-V you're allowed to do an
> OpSConvert with a source that is an unsigned 16-bit integer and a
> destination that is an unsigned 32-bit integer. What happens? Your uint
> -> uint cast sign extends! Yup.... That's what happens. No joke. The
> same is true of signed vs. unsigned division or modulus. The end result is
> that the types in SPIR-V are useless and you can't actually trust them for
> anything except bit-size and sometimes labelling something as a float vs.
> int.
> >
> > So how do we deal with the mismatch? My recommendation would be to use
> the source/destination types of the ALU ops where needed/useful, add
> special cases for things like imov and bcsel where you can pass the source
> type through, and insert extra bitcasts whenever they're needed. This is
> obvious enough that I'm going to guess you're already more-or-less doing
> exactly that. Sorry if I'm not being more helpful but the longer this
> e-mail gets the more convinced I become that SPIR-V and LLVM made the wrong
> design decision and we don't want to follow in their folly.
>
> Let me disagree on that one. However to start with I agree on the
> SPIR-v unsigned/signed type because they did not give it any semantics
> and it is hence very useless.
>
> However, I think in general types have advantages. In particular they
> result in some abstraction and flexibility. It allows LLVM frontends
> to generate LLVM IR closer to their source language and then have a
> set of passes to lower that to something more useful for a backend
> (and those passes can be shared between a lot of frontends). This
> results in one of the strengths of LLVM IMO, that you can throw a lot
> of pretty high level IR at it and that LLVM will produce something
> reasonable, which is especially nice considering the number of LLVM
> frontends.
>
> Furthermore, having an abstract type like e.g. Sampler (or even just a
> pointer) can be useful to have the frontend and middle end not care
> about properties of that type like the bitsize.
>
> And even NIR is growing a type system:
> - The canonical size descriptor is "bitsize x components"
> - With the 1-bit bools we are very explicitly using the above as types.
>
You're not wrong....
> - With derefs we have e.g. the requirement that most derefs take
> another deref as input becomes another implicitly encoded part of the
> type:
> - That we are going to need to keep track of deref bitwidths (for
> AMD they are *really* 64-bit) means a bunch of plumbing in
> optimizations that we did not need if they were abstracted.
>
Sorting out opaque types has been a bit of a sore point lately. NIR (as
you point out later) is a fairly low-level IR and the assumption was made
when it was designed that things would pretty much have a size when you hit
NIR. Booleans were a bit of an odd case and the 1-bit solution seems to
have worked out very well but you're right that it's almost a new type.
With pointers, my current plan (see the UBO/SSBO MR) is to have drivers a
type for each kind of pointer and spirv_to_nir will give them the type they
ask for. Would it be better to have an opaque SSA value type and a
lowering pass? That's entirely possible.
> And neither is the LLVM typesystem perfect (e.g. they somewhat
> recently removed the destination type from the pointer type).
>
> float vs. int is kinda useless for us though for a long time LLVM used
> it to put things in general purpose registers vs. fpu/mmx/sse, and
> that works relatively well as long as people do not do too many
> bitcasts in practice.
>
> Overall I don't believe the LLVM type system is a mistake, and that
> was the main point I wanted to get across. I do think it is probably
> not the right path for nir though as I see nir as a bit lower level
> and more backend centric than LLVM (partly due to its lack of
> comprehensive SSA types). It think nir being generally more backend
> centric is the right choice as we grow to many more backends than
> frontends.
>
I think this is the key issue. NIR has never claimed to be a high-level
IR; it's only claimed to be high enough level to be effectively
cross-vendor.
I should also say that I'm not staunchly opposed to a bit more descriptive
type information on SSA values in principle. As you pointed out, we
basically did for booleans and I've very considered it for pointers and
bindless handles. I just don't think that the need to insert bitcasts or
"because LLVM did it" are good reasons. When you look at the shaders we
get out of a lot of D3D translators, they're full of bitcasts everywhere
including storing integers and floats in the same vec4; the fact that NIR
chews them up and just doesn't care is a feature. :-)
--Jason
> >
> > --Jason
> >
> >
> >>
> >> > There are two general possible solutions:
> >> >
> >> > 1. convert all integers to floats in glsl_to_nir and prog_to_nir and
> >> > adjust various lowering/optimization passes to handle
> >> > nir_compiler_options::supports_native_integers == false.
> >> >
> >> > 2. Just allow integers all the way through until you get very close
> >> > to the end and then lower integers to floats at the last possible
> >> > moment.
> >> >
> >> > Both of these come with significant issues. With the first approach,
> >> > there are potentially a lot of passes that will need to be adjusted
> >> > and it's not 100% clear what to do with UBO offsets and indirect
> >> > addressing; fortunately, you should be able to disable most of those
> >> > optimizations to get going so it shouldn't be too bad. The second
> >> > would be less invasive to NIR because it doesn't require modifying as
> >> > many passes. However, doing such a lowering would be very tricky to
> >> > get right primarily because of constants. With everything else, you
> >> > can just sort of assume that inputs are floats (you lowered, right?)
> >> > and lower to produce a float output. With constants, however, you
> >> > don't know whether or not it's an integer that needs lowering. We
> >> > could, in theory, add an extra bit to load_const to solve this
> >> > problem but there are also significant problems with doing that so
> >> > it's not clear it's a good idea.
> >> >
> >> > I think the patches from Marek (as indicated by ilia) attempt the
> >> > first approach. If we can do it practically, my suspicion is that
> >> > the first will work better than the second. However, it will take
> >> > some experimentation in order to actually determine that.
> >> >
> >> > --Jason
> >> > _______________________________________________
> >> > mesa-dev mailing list
> >> > mesa-dev at lists.freedesktop.org
> >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/lima/attachments/20190103/21552993/attachment-0001.html>
More information about the lima
mailing list