[Mesa-dev] TGSI and Tessellation Control Shader outputs

Fri Aug 29 13:44:47 PDT 2014

Hello,

I've been thinking a bit about how to properly implement TCS outputs
in TGSI. As a quick reminder, there are per-vertex (i.e. invocation)
and per-patch outputs in TCS. And while you can only write to the
current invocation's per-vertex outputs, you can read from any of
them. (With barrier() used to synchronize invocations.)

Per-patch outputs map quite nicely onto the existing infrastructure,
so the rest of the questions will be about per-vertex outputs.

One can represent per-vertex outputs as 2D output arrays. That means
support for them needs to be added all over (which I've actually done,
so I'm not complaining about the extra work but rather asking if it's
a good idea). And then you might have

DCL OUT[][0], GENERIC
MOV ADDR[1].x, SV[0] /* invocation id */
MOV OUT[ADDR[1].x][0], TEMP[0] /* store value */
BARRIER
MOV TEMP[0], OUT[3][0] /* read output from invocation == 3 */

The advantage here is that it's all nice and consistent. However the
disadvantage is that we have to add a totally useless read of the
invocation id and use it as a relative index for the store. At least
the nvidia shaders don't even have a way of writing other invocations'
data even if they wanted to (without resorting to global memory
accesses). So it's complicating all sorts of logic for apparently no
real benefit.

Another approach might be to bypass the invocation id on storing the
output, but using it on reads. For example code like

DCL OUT[0], GENERIC
MOV OUT[0], TEMP[0]
BARRIER
MOV TEMP[0], OUT[3][0]

This avoids having to teach tgsi about 2d outputs (esp reladdr ones).
This seems a lot simpler, but it ignores the gl_InvocationID indexing
that happens when writing the output. However I don't think that's so
bad. It also means that reads and writes are interpreted a little
differently for OUT's, but that doesn't seem so bad either.

Thoughts?

  -ilia