[Mesa-dev] [PATCH 0/2] Nir: Allow CSE of SSBO loads

Mon Oct 26 12:04:24 PDT 2015

On Mon, Oct 26, 2015 at 12:19 AM, Iago Toral <itoral at igalia.com> wrote:
> On Fri, 2015-10-23 at 09:26 -0700, Jason Ekstrand wrote:
>> On Thu, Oct 22, 2015 at 11:13 PM, Iago Toral <itoral at igalia.com> wrote:
>> > On Thu, 2015-10-22 at 09:09 -0700, Jason Ekstrand wrote:
>> >> On Thu, Oct 22, 2015 at 4:21 AM, Iago Toral Quiroga <itoral at igalia.com> wrote:
>> >> > I implemented this first as a separate optimization pass in GLSL IR [1], but
>> >> > Curro pointed out that this being pretty much a restricted form of a CSE pass
>> >> > it would probably make more sense to do it inside CSE (and we no longer have
>> >> > a CSE pass in GLSL IR).
>> >> >
>> >> > Unlike other things we CSE in NIR, in the case of SSBO loads we need to make
>> >> > sure that we invalidate previous entries in the set in the presence of
>> >> > conflicting instructions (i.e. SSBO writes to the same block and offset) or
>> >> > in the presence of memory barriers.
>> >> >
>> >> > If this is accepted I intend to extend this to also cover image reads, which
>> >> > follow similar behavior.
>> >> >
>> >> > No regressions observed in piglit or dEQP's SSBO functional tests.
>> >> >
>> >> > [1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/097718.html
>> >>
>> >> I think you've gotten enough NAK's that I don't need to chime in
>> >> there.  Unfortunately, solving this in general is something of a
>> >> research project that both Connor and I have been thinking about for
>> >> quite some time.  I've been thinking off-and-on about how to add a
>> >> proper memory model to lower_vars_to_ssa for almost a year now and
>> >> still haven't come up with a good way to do it.  I don't know whether
>> >> SSBO's would be simpler or not.  We need a proper memory model for
>> >> both lower_vars_to_ssa and SSBO load/stores (and shared local
>> >> variables) but it's a substantial research project.
>> >>
>> >> This isn't to say that you couldn't do it.  Just know what you're taking on. ;-)
>> >
>> > Yeah, it does not make sense that I try to do this, you guys have
>> > clearly given this much more thought than me and know much better how a
>> > solution for this would fit in NIR than me.
>> >
>> >> That said, here's a suggestion for something that we *could* write
>> >> today, wouldn't be very hard, and wold solve a decent number of cases.
>> >>
>> >> For each block:
>> >>
>> >> 1) Create a new instruction set (don't use anything from any previous blocks)
>> >> 2) call add_or_rewrite on all ssbo load operations
>> >> 3) If you ever see a barrier or ssbo store, destroy the entire
>> >> instruction set and start again.
>> >
>> > Yep, this is what I was thinking for the load-combine pass that Connor
>> > suggested. However, I think that in this case we do not need to destroy
>> > the entire set when we find a store, only for memory barriers, right? I
>> > mean, there should be nothing preventing us from checking the
>> > offset/block of the store and compare it with the offset/block of the
>> > loads in the set to decide which ones we need to remove (like I was
>> > doing in my last patch)
>>
>> That's where you get into the "special casing" I mentioned below.  If
>> you have an direct store, you would have to throw away any indirect
>> loads
>
> Yes.
>
>> and then insert a fake direct load for the given offset.
>
> Actually, what I am doing is a bit different:
>
> When I see stores, I also insert them in the hash table (but I never
> rewrite stores). Then, when I see see a load, I check for a match, if I
> have it, I use it, if not, I check if I have a store to the same offset,
> and If I do, I just use that, no need to fake anything. Of course, if I
> do this, in order to check if I have a compatible store I have to
> traverse the hash table looking for a match, but I think that should be
> okay in this case, since that only has load/store operations and only
> the ones in the current block, so I think it should be okay.
>
> Does this seem like a reasonable alternative?

Yeah, that's fine.

>>   If you
>> have an indirect store, you would have to throw away everything and
>> then insert a fake indirect load for the given offset.  So, yes, you
>> can do it, but it'll take a little more work.
>
> Yeah, mostly because there are also atomics to consider and then you
> also have to check that the stores write to all the components we read
> before we reuse them, etc.
>
>>   You'll also probably
>> want to use a different hashing function than we use for CSE because
>> you need to be able to handle both real and fake loads and have them
>> compare/hash the same.
>>
>> Does that make sense?
>> --Jason
>>
>> >> This is something you could put together fairly quickly and would
>> >> handle a fair number of cases.  With a little special casing, you may
>> >> also be able to handle store and then an immediate load of the same
>> >> value or duplicate stores.  Anything much more complex than that is
>> >> going to take a lot more thought.
>> >
>> > Yes, I'll give this a try next. Thanks for all the comments and
>> > suggestions!
>> >
>> > Iago
>> >
>>
>
>