[Mesa-dev] NIR constant problem for GPU which doesn't have native integer support

Fri Jan 4 10:07:15 UTC 2019

On Thu, 2019-01-03 at 11:58 -0600, Jason Ekstrand wrote:
> On Thu, Jan 3, 2019 at 3:39 AM Erik Faye-Lund <
> erik.faye-lund at collabora.com> wrote:
> > On Wed, 2019-01-02 at 10:16 -0600, Jason Ekstrand wrote:
> > > On Wed, Jan 2, 2019 at 9:43 AM Ilia Mirkin <imirkin at alum.mit.edu>
> > > wrote:
> > > > Have a look at the first 4 patches in the series from Jonathan
> > > > Marek
> > > > to address some of these issues:
> > > > 
> > > > https://patchwork.freedesktop.org/series/54295/
> > > > 
> > > > Not sure exactly what state that work is in, but I've added
> > > > Jonathan
> > > > to CC, perhaps he can provide an update.
> > > > 
> > > > Cheers,
> > > > 
> > > >   -ilia
> > > > 
> > > > On Wed, Jan 2, 2019 at 6:28 AM Qiang Yu <yuq825 at gmail.com>
> > wrote:
> > > > >
> > > > > Hi guys,
> > > > >
> > > > > I found the problem with this test fragment shader when lima
> > > > development:
> > > > > uniform int color;
> > > > > void main() {
> > > > >     if (color > 1)
> > > > >         gl_FragColor = vec4(1.0, 0.0, 0.0, 1);
> > > > >     else
> > > > >         gl_FragColor = vec4(0.0, 1.0, 0.0, 1);
> > > > > }
> > > > >
> > > > > nir_print_shader output:
> > > > > impl main {
> > > > >         block block_0:
> > > > >         /* preds: */
> > > > >         vec1 32 ssa_0 = load_const (0x00000001 /* 0.000000
> > */)
> > > > >         vec4 32 ssa_1 = load_const (0x3f800000 /* 1.000000
> > */,
> > > > > 0x00000000 /* 0.000000 */, 0x00000000 /* 0.000000 */,
> > 0x3f800000
> > > > /*
> > > > > 1.000000 */)
> > > > >         vec4 32 ssa_2 = load_const (0x00000000 /* 0.000000
> > */,
> > > > > 0x3f800000 /* 1.000000 */, 0x00000000 /* 0.000000 */,
> > 0x3f800000
> > > > /*
> > > > > 1.000000 */)
> > > > >         vec1 32 ssa_3 = load_const (0x00000000 /* 0.000000
> > */)
> > > > >         vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 1,
> > 0)
> > > > /*
> > > > > base=0 */ /* range=1 */ /* component=0 */   /* color */
> > > > >         vec1 32 ssa_5 = slt ssa_0, ssa_4
> > > > >         vec1 32 ssa_6 = fnot ssa_5
> > > > >         vec4 32 ssa_7 = bcsel ssa_6.xxxx, ssa_2, ssa_1
> > > > >         intrinsic store_output (ssa_7, ssa_3) (0, 15, 0) /*
> > > > base=0 */
> > > > > /* wrmask=xyzw */ /* component=0 */       /* gl_FragColor */
> > > > >         /* succs: block_1 */
> > > > >         block block_1:
> > > > > }
> > > > >
> > > > > ssa0 is not converted to float when glsl to nir. I see
> > > > glsl_to_nir.cpp
> > > > > will create flt/ilt/ult
> > > > > based on source type for gpu support native integer, but for
> > gpu
> > > > not
> > > > > support native
> > > > > integer, just create slt for all source type. And in
> > > > > nir_lower_constant_initializers,
> > > > > there's also no type conversion for integer constant.
> > > 
> > > This is a generally sticky issue.  In NIR, we have no concept of
> > > types on SSA values which has proven perfectly reasonable and
> > > actually very powerful in a world where integers are supported
> > > natively.  Unfortunately, it causes significant problems for
> > float-
> > > only architectures.
> > 
> > I would like to take this chance to say that this untyped SSA-value
> > choice has lead to issues in both radeon_si (because LLVM values
> > are
> > typed) and zink (similarly, because SPIR-V values are typed), where
> > we
> > need to to bitcasts on every access because there's just not enough
> > information available to emit variables with the right type.
> 
> I'm not sure if I agree that the two problems are the same or not... 
> More on that in a bit.
>  
> > It took us a lot of time to realize that the meta-data from the
> > opcodes
> > doesn't *really* provide this, because the rest of nir doesn't
> > treat
> > values consistently. In fact, this feels arguably more like buggy
> > behavior; why do we even have fmov when all of the time the
> > compiler
> > will emit imovs for floating-point values...? Or why do we have
> > bitcast
> 
> Why do we have different mov opcodes?  Because they have different
> behavior in the presence of source/destination modifiers.

Is this general NIR-behavior (i.e will this be honored by constant
folding etc), or is it Intel specific? If it's NIR-behavior, is it
documented somewhere?

In either case, I have a feeling that this is a mis-design; the
modifiers apply to the operands, not the opcodes. It sounds a bit like
imov shouldn't have been an ALU instruction, but perhaps some other,
new type. Or perhaps the concept we have should have been a bit finer
granularity. It's a bit hard to tell without more details, and I can't
seem to find any on my own.

Generally, I wish we had some more of the details here written down.
Implementing support for NIR for a new architecture is currently quite
tricky, mostly because chasing down the details are hard.

> You likely don't use those in radeon or Zink but we do use them on
> Intel.  That being said, I've very seriously considered adding a
> generic nir_op_mov which is entirely typeless and doesn't support
> modifiers at all and make most of core NIR use that.  For that
> matter, there's no real reason why we need fmov with modifiers at all
> when we could we could easily replace "ssa1 = fmov.sat |x|" with
> "ssa1 = fsat |x|" or "ssa1 = fabs.sat x".  If we had a generic
> nir_op_mov to deal with swizzling etc., the only thing having i/fmov
> would really accomplish is lowering the number of different ways you
> can write "fmov.sat |x|".  The only reason I haven't added this
> nir_op_mov opcode is because it's a pile of work and anyone who can't
> do integers can just implement imov as fmov and it's not a big deal.

That sounds reasonable enough, and I understand that this hasn't been a
priority so far.

> > I would really love it if we could at least consider making this
> > "we
> > can just treat floats as ints without bitcasts if we feel like it"-
> > design optional for the backend somehow.
> > 
> > I guess the assumption is that bitcasts are for free? They aren't
> > once
> > you have to emit them and have a back-end remove a bunch of
> > redundant
> > ones... We should already have all the information to trivially
> > place
> > casts for backends that care, yet we currently make it hard (unless
> > your HW/backend happens to be untyped)...
> > 
> > Is there some way we can perhaps improve this for backends that
> > care?
> 
> That's an interesting question...
> 
> Before answering that question, let's back up and ask why.  From a
> hardware perspective, I know of no graphics hardware that puts types
> on registers.  On Intel, we assign types to sources and destinations
> of instructions but outside a specific instruction, a register is
> just a bag of bits.  I suspect most other hardware is the same.  In
> the hardware that his particular thread was originally about, you
> could argue that they have a type and it's always float.  The real
> issue that the GLES2 people are having is that constants have a bag
> of bits value which may need to be interpreted as integer or float
> depending on context which they do not have when lowering.  They
> don't care, for instance, about the fact that imov or bcsel take
> integers because they know that the sources will either be floats or
> integers that they've lowered to floats and so they know that the
> result will be a float.
> 
> The problem you're describing is in converting from NIR to another
> IR, not to hardware.

Yes; NIR serves *both* as an LLIR *and* as a HIR. Your analysis above
seems to only make sense when it's used as an LLIR, though.

What I'm trying to say, is that since we *have* this information when
moving from GLSL, perhaps we could find a way of preserving it so a
backend that needs it could use it without having to recompute it.
Which turns out to be a lot trickier than it sounds, more on this
later.

> In LLVM they made a choice to put types on SSA values but then to
> have the actual semantics be based on the instruction itself.  This
> leads to lots of redundancy in the IR but also lots of things you can
> validate which is kind-of nice.  Is that redundancy really useful? 
> In our experience with NIR, we haven't found it to be other than
> booleans (now sorted), this one constant issue, and translating to
> IRs that have that redundancy.  Then why did they add it?  I'm really
> not sure but it may, at least somewhat, be related to the fact that
> they allow arrays and structs in their SSA values and so need full
> types.  This is another design decision in LLVM which I find highly
> questionable.  What you're is more-or-less that NIR should carry,
> maintain, and validate extra useless information just so we can pass
> it on to LLVM which is going to use it for what, exactly?  Sorry if
> I'm extremely reluctant to make fairly fundamental changes to NIR
> with no better reason than "LLVM did it that way".
> 
> There's a second reason why I don't like the idea of putting types on
> SSA values: It's extremely deceptive.  In SPIR-V you're allowed to do
> an OpSConvert with a source that is an unsigned 16-bit integer and a
> destination that is an unsigned 32-bit integer.  What happens?  Your
> uint -> uint cast sign extends!  Yup.... That's what happens.  No
> joke.  The same is true of signed vs. unsigned division or modulus. 
> The end result is that the types in SPIR-V are useless and you can't
> actually trust them for anything except bit-size and sometimes
> labelling something as a float vs. int.

Right. I believe it's already been pointed out in this thread that it
might actually be useful to differentiate between integers and float
values, at least for some CPU architectures. For signed vs unsigned,
it'd be no problem for zink to treat these as the same.

In my use-case, it's not because *I* think it's perticularly useful in
itself, but because it's a requirement for generating valid SPIR-V. I
guess SPIR-V inherited this from LLVM?

I'm not particularly worried about adding a couple of bits (or perhaps
even one?) per nir_ssa_def (what you seem to call "faily fundamental
changes to NIR" above), if that helps here.

> So how do we deal with the mismatch?  My recommendation would be to
> use the source/destination types of the ALU ops where needed/useful,
> add special cases for things like imov and bcsel where you can pass
> the source type through, and insert extra bitcasts whenever they're
> needed.  This is obvious enough that I'm going to guess you're
> already more-or-less doing exactly that.

We tried this approach for a while, but we couldn't get it to work.
There were too many situations where we couldn't know the type early
enough to avoid crawling through piles of old instructions to figure it
out, for instance when reading a constant value (untyped), constructing
some vector with it (still untyped). It's not until we read it that we
know what type we needed.

In fact, vectors itself cause major headache here, because they need to
have consistent types in it, and NIR can easily let you create a vector
with floats and ints in it, side by side. Perhaps that could be
considered a feature... I wouldn't, at least not on a HIR level. Once
the code hits a backend, perhaps.

> Sorry if I'm not being more helpful but the longer this e-mail gets
> the more convinced I become that SPIR-V and LLVM made the wrong
> design decision and we don't want to follow in their folly.
> 
> --Jason
> 
>  
> > >   There are two general possible solutions:
> > > 
> > >  1. convert all integers to floats in glsl_to_nir and prog_to_nir
> > and
> > > adjust various lowering/optimization passes to handle
> > > nir_compiler_options::supports_native_integers == false.
> > > 
> > >  2. Just allow integers all the way through until you get very
> > close
> > > to the end and then lower integers to floats at the last possible
> > > moment.
> > > 
> > > Both of these come with significant issues.  With the first
> > approach,
> > > there are potentially a lot of passes that will need to be
> > adjusted
> > > and it's not 100% clear what to do with UBO offsets and indirect
> > > addressing; fortunately, you should be able to disable most of
> > those
> > > optimizations to get going so it shouldn't be too bad.  The
> > second
> > > would be less invasive to NIR because it doesn't require
> > modifying as
> > > many passes.  However, doing such a lowering would be very tricky
> > to
> > > get right primarily because of constants.  With everything else,
> > you
> > > can just sort of assume that inputs are floats (you lowered,
> > right?)
> > > and lower to produce a float output.  With constants, however,
> > you
> > > don't know whether or not it's an integer that needs lowering. 
> > We
> > > could, in theory, add an extra bit to load_const to solve this
> > > problem but there are also significant problems with doing that
> > so
> > > it's not clear it's a good idea.
> > > 
> > > I think the patches from Marek (as indicated by ilia) attempt the
> > > first approach.  If we can do it practically, my suspicion is
> > that
> > > the first will work better than the second.  However, it will
> > take
> > > some experimentation in order to actually determine that.
> > > 
> > > --Jason
> > > _______________________________________________
> > > mesa-dev mailing list
> > > mesa-dev at lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > 
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev