[Mesa-dev] RFC: buffer support in TGSI for SSBO/atomic

Ilia Mirkin imirkin at alum.mit.edu
Mon Nov 2 11:07:05 PST 2015


I haven't the faintest idea about efficiently, but these things flags
on the ld/st instructions in the nvidia ISA for SM20+ (and I just
plain don't know about SM10). I'm moderately sure that's the case for
GCN as well.

The difficulty with TGSI is that you might have something like

layout (std430) buffer foo {
  coherent int a;
  int b;
}

Now I don't remember if they get baked into the same vec4, but I think
they do. If they don't, then ARB_enhanced_layouts will fix that right
up. Since TGSI is vec4-oriented, it's really awkward to specify that
sort of thing... how would you do it?

DECL BUFFER[0][0].x COHERENT
DECL BUFFER[0][0].y

And then totally unrelated to the separate bits, you can end up with

layout (std430) buffer foo {
  int foo[5];
}

and I have no idea how to even express that in TGSI -- it'd want
things to be aligned to 16 bytes, but it'll be packed tightly here.
This worked OK for layout (std140), but won't work with more advanced
layouts. This will be a problem for UBOs too -- perhaps we need to
allow something like

LOAD dst, CONST[1][0], offset

to account for that. And lastly, ssbo allows for something like

layout (std430) buffer foo {
  int foo[];
}

And you can access foo[anything-you-want] -- difficult to declare that
in TGSI. I could invent stuff for all of these situations, but it
seems to be a lot easier to just feed the data to load and forget
about it. That's how it's all encoded in the GLSL IR as well.

  -ilia


On Mon, Nov 2, 2015 at 1:56 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> I don't know much about ssbo, but since it looks like in glsl the
> coherent etc. bits are on the variables, not the ops, it seems unnatural
> to mark the op bits instead. So I'd guess it would be better if the
> variables could be marked instead. If this isn't expressible in tgsi
> maybe this needs to be fixed. Albeit I have to say it sounds odd to me
> from a hw perspective if this variables with different bits can be
> stuffed together and then the hw is expected to handle that efficiently...
>
> Roland
>
> Am 01.11.2015 um 23:45 schrieb Ilia Mirkin:
>> Just wanted to note down some thoughts and get some feedback before
>> going forward. I've already sent out a series which covered a lot of
>> this, but in the end I realized it came up a bit short (available at
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_imirkin_mesa_commits_fd2&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=ZEO6K764MpKKCTrBFReM7jS6WlerLtMTWbj_OABE6K8&s=yJ3Ee990VBHMVTEQzdXBcPDd1ioo-BizrAGpP4kU-Cg&e= ).
>>
>> There are two separate buffer-related features --
>> ARB_shader_atomic_counters(_ops) and
>> ARB_shader_storage_buffer_objects. The former are implementable more
>> efficiently on EG/NI hardware by performing the atomic ops on
>> not-main-memory (GDS? LDS?). However I think that the gallium-side
>> interface can be mostly identical for both cases, perhaps we can mark
>> the buffer as atomic-only in the TGSI.
>>
>> Just like there is a CONST tgsi file, I want to add a BUFFER file,
>> which will map to ->set_shader_buffers() indices. The tricky bit comes
>> in from the fact that individual variables inside of a buffer may have
>> different access/store properties. I see two ways to resolve this:
>>
>> 1. Declare each variable explicitly, much like UBO's still get
>> individual decls per slot. These decls could contain the relevant
>> caching property.
>>
>> 2. Make each LOAD/STORE op declare what caching it wants explicitly.
>>
>> The first option would work well for images, but for ssbo, it feels
>> problematic, as with all the various packing options that exist, you
>> could still specify odd per-variable cache rules, which would be
>> difficult to express in the TGSI DECL. However I'm not sure how to
>> implement the second option.
>>
>> There is a precedent of a saturate flag, but looking at
>> tgsi_instruction, there are only 2 free bits. Since there are only 4
>> different caching values (none, coherent, volatile, restrict; I'm not
>> counting readonly/writeonly), this fits. However that would leave no
>> more bits in tgsi_instruction. I could add a texture-style bit, saying
>> to expect an additional tgsi_instruction_buffer packet with more info
>> but that seems wasteful.
>>
>> Another option is to just pass an immediate directly to the LOAD/STORE
>> ops which would specify this caching spec as an extra source. This
>> seems much simpler, but a little dirtier. Opinions much appreciated.
>>
>> I think that one this is worked out, I'll be able to resend my series
>> adding SSBO/atomic support to freedreno, and partial SSBO (without
>> atomic*) support for nvc0.
>>
>> Cheers,
>>
>>   -ilia
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=ZEO6K764MpKKCTrBFReM7jS6WlerLtMTWbj_OABE6K8&s=OnyoWgHxyrDIN6esIAWVu0pQP5Mk8Iz3wNrzeeuTbvo&e=
>>
>


More information about the mesa-dev mailing list