[Mesa-dev] TGSI and Tessellation Control Shader outputs

Thu Oct 9 07:59:14 PDT 2014

I have been thinking about this more and I actually like the way
OpenGL does it. The indexing with InvocationID can be lowered with a
copy propagation pass for drivers that cannot do it - or they can just
ignore the innermost index and assume it's always equal to
InvocationID. I also prefer having readable shader outputs.

One little ugly thing right now is that patch outputs are
one-dimensional and vertex outputs are 2-dimensional. So you normally
get:

OUT[][0], POSITION
OUT[1], PATCH
OUT[2], PATCH1
OUT[][3], GENERIC
OUT[4], TESSINNER
OUT[5], TESSOUTER

We can either leave it this way and assume that if an output access is
2-dimensional, it's per-vertex, otherwise it's per-patch. Or we can
add another file for per-vertex data. The same applies to shader
inputs and I think we have had this since geometry shaders:

IN[][0], POSITION
IN[1], PRIMITIVEID

Not to say that indirect addressing into outputs is a mess. For that,
it would be better to have a strict mapping from outputs to semantics,
e.g. OUT0[i] == PATCHi and OUT1[][j] == GENERICj. Alternatively, we
can just explicitly use semantic names in the shader code, e.g.:

MOV OUT.GENERIC[ADDR.x], TEMP[0]

But that would be a lot of work and I'd rather not delay upstreaming
tessellation because of this.

Thoughts?

Marek

On Fri, Aug 29, 2014 at 10:44 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> Hello,
>
> I've been thinking a bit about how to properly implement TCS outputs
> in TGSI. As a quick reminder, there are per-vertex (i.e. invocation)
> and per-patch outputs in TCS. And while you can only write to the
> current invocation's per-vertex outputs, you can read from any of
> them. (With barrier() used to synchronize invocations.)
>
> Per-patch outputs map quite nicely onto the existing infrastructure,
> so the rest of the questions will be about per-vertex outputs.
>
> One can represent per-vertex outputs as 2D output arrays. That means
> support for them needs to be added all over (which I've actually done,
> so I'm not complaining about the extra work but rather asking if it's
> a good idea). And then you might have
>
> DCL OUT[][0], GENERIC
> MOV ADDR[1].x, SV[0] /* invocation id */
> MOV OUT[ADDR[1].x][0], TEMP[0] /* store value */
> BARRIER
> MOV TEMP[0], OUT[3][0] /* read output from invocation == 3 */
>
> The advantage here is that it's all nice and consistent. However the
> disadvantage is that we have to add a totally useless read of the
> invocation id and use it as a relative index for the store. At least
> the nvidia shaders don't even have a way of writing other invocations'
> data even if they wanted to (without resorting to global memory
> accesses). So it's complicating all sorts of logic for apparently no
> real benefit.
>
> Another approach might be to bypass the invocation id on storing the
> output, but using it on reads. For example code like
>
> DCL OUT[0], GENERIC
> MOV OUT[0], TEMP[0]
> BARRIER
> MOV TEMP[0], OUT[3][0]
>
> This avoids having to teach tgsi about 2d outputs (esp reladdr ones).
> This seems a lot simpler, but it ignores the gl_InvocationID indexing
> that happens when writing the output. However I don't think that's so
> bad. It also means that reads and writes are interpreted a little
> differently for OUT's, but that doesn't seem so bad either.
>
> Thoughts?
>
>   -ilia
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev