[Mesa-dev] SSBO + access qualifiers

Mon Dec 14 21:30:07 PST 2015

On Mon, Dec 14, 2015 at 8:57 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> On Mon, Dec 14, 2015 at 11:44 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>> On Mon, Dec 14, 2015 at 8:03 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>> On Mon, Dec 14, 2015 at 10:50 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>> On Mon, Dec 14, 2015 at 4:10 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>>> Hello,
>>>>>
>>>>> I was going to add support for the various volatile/etc annotations in
>>>>> my gallium impl (which is fairly far along), but it seems like those
>>>>> are currently being dropped on the floor by lower_ubo_references, and
>>>>> NIR has no support for them either. Both in GLSL IR and NIR, the
>>>>> variable is what holds the access annotation (volatile, etc). However
>>>>> the ssbo intrinsics are all about a particular binding and offset,
>>>>> without any reference to the variable.
>>>>>
>>>>> What's the right way of handling this? (And do any tests exist that
>>>>> would be sensitive to getting such things wrong?)
>>>>
>>>> First off, why is it that way?  Well, because most of the IRs in mesa
>>>> don't have a memory model capable of handling these sorts of things.
>>>> Those that do (LLVM is the only one I'm aware of) can't handle the GL
>>>> packing rules.  The result is that we basically have to lower
>>>> everything away to byte-offset load/store intrinsics.
>>>>
>>>> What do we do about it?  My inclination would be to either add two new
>>>> intrinsics for load/store_ssbo_volatile or to add a new constant
>>>> boolean "volatile" parameter to the current intrinsics.  If a
>>>> load/store happens on a volatile things, you get the volatile version,
>>>> otherwise you get the plane version.  Then backends can know that they
>>>> are free to reorder, combine, etc. non-volatile load/store operations
>>>> as per their memory model and the provided barriers.  If they
>>>> encounter a volatile load/store, they can't do anything with it and
>>>> just have to do the memory op.
>>>>
>>>> Is that reasonable?
>>>
>>> Well, I don't really care much about the NIR-side of it, but I think
>>> at the very least coherent and volatile need to be exposed. I don't
>>> much care about restrict, that's a bit too fancy for me, but no harm
>>> in passing it through. TBH I don't get the point of
>>> readonly/writeonly. FWIW this is what the nvidia hw is capable of:
>>> http://docs.nvidia.com/cuda/parallel-thread-execution/#cache-operators
>>> (I know it's PTX, but the actual hw is basically the same.)
>>>
>>> I think that volatile = cv, coherent = cg, and everything else = ca/cs
>>> based on some clever heuristic (aka I'll always pick one of those
>>> until a situation where that doesn't work well arises).
>>
>> Right.  I don't actually know what we can do with our HW.  I was
>> mostly thinking about what optimization passes can or cannot do with a
>> variable.  It sounds like, whatever we do, you want it passed through
>> somehow.  How about we just add a little well-defined bitfield as an
>> extra argument to the SSBO load/store intrinsics?  The fact that it
>> gets applied to a variable or a struct or a member or whatever is kind
>> of a moot point to the backend compiler.
>
> An extra argument with a mask or enum is precisely what I was
> expecting to find when looking at the lower_ubo_references pass when I
> was rather surprised to see that the whole thing had been overlooked.

Seems reasonable to me.  However, Intel hardware doesn't really give
you that fine-grained control so we didn't care.  Sounds like someone
should add it.  Have fun!
--Jason