[Mesa-dev] RFC: buffer support in TGSI for SSBO/atomic
Ilia Mirkin
imirkin at alum.mit.edu
Mon Nov 2 11:12:48 PST 2015
Another fun example to try to express properly in TGSI:
buffer foo {
struct bar {
coherent int a;
int b;
} asdf[10];
}
Now all of a sudden you have to worry about stride for the declarations.
-ilia
On Mon, Nov 2, 2015 at 2:07 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> I haven't the faintest idea about efficiently, but these things flags
> on the ld/st instructions in the nvidia ISA for SM20+ (and I just
> plain don't know about SM10). I'm moderately sure that's the case for
> GCN as well.
>
> The difficulty with TGSI is that you might have something like
>
> layout (std430) buffer foo {
> coherent int a;
> int b;
> }
>
> Now I don't remember if they get baked into the same vec4, but I think
> they do. If they don't, then ARB_enhanced_layouts will fix that right
> up. Since TGSI is vec4-oriented, it's really awkward to specify that
> sort of thing... how would you do it?
>
> DECL BUFFER[0][0].x COHERENT
> DECL BUFFER[0][0].y
>
> And then totally unrelated to the separate bits, you can end up with
>
> layout (std430) buffer foo {
> int foo[5];
> }
>
> and I have no idea how to even express that in TGSI -- it'd want
> things to be aligned to 16 bytes, but it'll be packed tightly here.
> This worked OK for layout (std140), but won't work with more advanced
> layouts. This will be a problem for UBOs too -- perhaps we need to
> allow something like
>
> LOAD dst, CONST[1][0], offset
>
> to account for that. And lastly, ssbo allows for something like
>
> layout (std430) buffer foo {
> int foo[];
> }
>
> And you can access foo[anything-you-want] -- difficult to declare that
> in TGSI. I could invent stuff for all of these situations, but it
> seems to be a lot easier to just feed the data to load and forget
> about it. That's how it's all encoded in the GLSL IR as well.
>
> -ilia
>
>
> On Mon, Nov 2, 2015 at 1:56 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>> I don't know much about ssbo, but since it looks like in glsl the
>> coherent etc. bits are on the variables, not the ops, it seems unnatural
>> to mark the op bits instead. So I'd guess it would be better if the
>> variables could be marked instead. If this isn't expressible in tgsi
>> maybe this needs to be fixed. Albeit I have to say it sounds odd to me
>> from a hw perspective if this variables with different bits can be
>> stuffed together and then the hw is expected to handle that efficiently...
>>
>> Roland
>>
>> Am 01.11.2015 um 23:45 schrieb Ilia Mirkin:
>>> Just wanted to note down some thoughts and get some feedback before
>>> going forward. I've already sent out a series which covered a lot of
>>> this, but in the end I realized it came up a bit short (available at
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_imirkin_mesa_commits_fd2&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=ZEO6K764MpKKCTrBFReM7jS6WlerLtMTWbj_OABE6K8&s=yJ3Ee990VBHMVTEQzdXBcPDd1ioo-BizrAGpP4kU-Cg&e= ).
>>>
>>> There are two separate buffer-related features --
>>> ARB_shader_atomic_counters(_ops) and
>>> ARB_shader_storage_buffer_objects. The former are implementable more
>>> efficiently on EG/NI hardware by performing the atomic ops on
>>> not-main-memory (GDS? LDS?). However I think that the gallium-side
>>> interface can be mostly identical for both cases, perhaps we can mark
>>> the buffer as atomic-only in the TGSI.
>>>
>>> Just like there is a CONST tgsi file, I want to add a BUFFER file,
>>> which will map to ->set_shader_buffers() indices. The tricky bit comes
>>> in from the fact that individual variables inside of a buffer may have
>>> different access/store properties. I see two ways to resolve this:
>>>
>>> 1. Declare each variable explicitly, much like UBO's still get
>>> individual decls per slot. These decls could contain the relevant
>>> caching property.
>>>
>>> 2. Make each LOAD/STORE op declare what caching it wants explicitly.
>>>
>>> The first option would work well for images, but for ssbo, it feels
>>> problematic, as with all the various packing options that exist, you
>>> could still specify odd per-variable cache rules, which would be
>>> difficult to express in the TGSI DECL. However I'm not sure how to
>>> implement the second option.
>>>
>>> There is a precedent of a saturate flag, but looking at
>>> tgsi_instruction, there are only 2 free bits. Since there are only 4
>>> different caching values (none, coherent, volatile, restrict; I'm not
>>> counting readonly/writeonly), this fits. However that would leave no
>>> more bits in tgsi_instruction. I could add a texture-style bit, saying
>>> to expect an additional tgsi_instruction_buffer packet with more info
>>> but that seems wasteful.
>>>
>>> Another option is to just pass an immediate directly to the LOAD/STORE
>>> ops which would specify this caching spec as an extra source. This
>>> seems much simpler, but a little dirtier. Opinions much appreciated.
>>>
>>> I think that one this is worked out, I'll be able to resend my series
>>> adding SSBO/atomic support to freedreno, and partial SSBO (without
>>> atomic*) support for nvc0.
>>>
>>> Cheers,
>>>
>>> -ilia
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=ZEO6K764MpKKCTrBFReM7jS6WlerLtMTWbj_OABE6K8&s=OnyoWgHxyrDIN6esIAWVu0pQP5Mk8Iz3wNrzeeuTbvo&e=
>>>
>>
More information about the mesa-dev
mailing list