[Mesa-dev] [PATCH 0/2] Nir: Allow CSE of SSBO loads

Mon Oct 26 00:19:14 PDT 2015

On Fri, 2015-10-23 at 09:26 -0700, Jason Ekstrand wrote:
> On Thu, Oct 22, 2015 at 11:13 PM, Iago Toral <itoral at igalia.com> wrote:
> > On Thu, 2015-10-22 at 09:09 -0700, Jason Ekstrand wrote:
> >> On Thu, Oct 22, 2015 at 4:21 AM, Iago Toral Quiroga <itoral at igalia.com> wrote:
> >> > I implemented this first as a separate optimization pass in GLSL IR [1], but
> >> > Curro pointed out that this being pretty much a restricted form of a CSE pass
> >> > it would probably make more sense to do it inside CSE (and we no longer have
> >> > a CSE pass in GLSL IR).
> >> >
> >> > Unlike other things we CSE in NIR, in the case of SSBO loads we need to make
> >> > sure that we invalidate previous entries in the set in the presence of
> >> > conflicting instructions (i.e. SSBO writes to the same block and offset) or
> >> > in the presence of memory barriers.
> >> >
> >> > If this is accepted I intend to extend this to also cover image reads, which
> >> > follow similar behavior.
> >> >
> >> > No regressions observed in piglit or dEQP's SSBO functional tests.
> >> >
> >> > [1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/097718.html
> >>
> >> I think you've gotten enough NAK's that I don't need to chime in
> >> there.  Unfortunately, solving this in general is something of a
> >> research project that both Connor and I have been thinking about for
> >> quite some time.  I've been thinking off-and-on about how to add a
> >> proper memory model to lower_vars_to_ssa for almost a year now and
> >> still haven't come up with a good way to do it.  I don't know whether
> >> SSBO's would be simpler or not.  We need a proper memory model for
> >> both lower_vars_to_ssa and SSBO load/stores (and shared local
> >> variables) but it's a substantial research project.
> >>
> >> This isn't to say that you couldn't do it.  Just know what you're taking on. ;-)
> >
> > Yeah, it does not make sense that I try to do this, you guys have
> > clearly given this much more thought than me and know much better how a
> > solution for this would fit in NIR than me.
> >
> >> That said, here's a suggestion for something that we *could* write
> >> today, wouldn't be very hard, and wold solve a decent number of cases.
> >>
> >> For each block:
> >>
> >> 1) Create a new instruction set (don't use anything from any previous blocks)
> >> 2) call add_or_rewrite on all ssbo load operations
> >> 3) If you ever see a barrier or ssbo store, destroy the entire
> >> instruction set and start again.
> >
> > Yep, this is what I was thinking for the load-combine pass that Connor
> > suggested. However, I think that in this case we do not need to destroy
> > the entire set when we find a store, only for memory barriers, right? I
> > mean, there should be nothing preventing us from checking the
> > offset/block of the store and compare it with the offset/block of the
> > loads in the set to decide which ones we need to remove (like I was
> > doing in my last patch)
> 
> That's where you get into the "special casing" I mentioned below.  If
> you have an direct store, you would have to throw away any indirect
> loads 

Yes.

> and then insert a fake direct load for the given offset.

Actually, what I am doing is a bit different:

When I see stores, I also insert them in the hash table (but I never
rewrite stores). Then, when I see see a load, I check for a match, if I
have it, I use it, if not, I check if I have a store to the same offset,
and If I do, I just use that, no need to fake anything. Of course, if I
do this, in order to check if I have a compatible store I have to
traverse the hash table looking for a match, but I think that should be
okay in this case, since that only has load/store operations and only
the ones in the current block, so I think it should be okay.

Does this seem like a reasonable alternative?

>   If you
> have an indirect store, you would have to throw away everything and
> then insert a fake indirect load for the given offset.  So, yes, you
> can do it, but it'll take a little more work.

Yeah, mostly because there are also atomics to consider and then you
also have to check that the stores write to all the components we read
before we reuse them, etc.

>   You'll also probably
> want to use a different hashing function than we use for CSE because
> you need to be able to handle both real and fake loads and have them
> compare/hash the same.
>
> Does that make sense?
> --Jason
> 
> >> This is something you could put together fairly quickly and would
> >> handle a fair number of cases.  With a little special casing, you may
> >> also be able to handle store and then an immediate load of the same
> >> value or duplicate stores.  Anything much more complex than that is
> >> going to take a lot more thought.
> >
> > Yes, I'll give this a try next. Thanks for all the comments and
> > suggestions!
> >
> > Iago
> >
>