[Mesa-dev] TGSI and Tessellation Control Shader outputs

Mon Sep 1 09:00:13 PDT 2014

Am 29.08.2014 22:44, schrieb Ilia Mirkin:
> Hello,
> 
> I've been thinking a bit about how to properly implement TCS outputs
> in TGSI. As a quick reminder, there are per-vertex (i.e. invocation)
> and per-patch outputs in TCS. And while you can only write to the
> current invocation's per-vertex outputs, you can read from any of
> them. (With barrier() used to synchronize invocations.)
> 
> Per-patch outputs map quite nicely onto the existing infrastructure,
> so the rest of the questions will be about per-vertex outputs.
> 
> One can represent per-vertex outputs as 2D output arrays. That means
> support for them needs to be added all over (which I've actually done,
> so I'm not complaining about the extra work but rather asking if it's
> a good idea). And then you might have
> 
> DCL OUT[][0], GENERIC
> MOV ADDR[1].x, SV[0] /* invocation id */
> MOV OUT[ADDR[1].x][0], TEMP[0] /* store value */
> BARRIER
> MOV TEMP[0], OUT[3][0] /* read output from invocation == 3 */
> 
> The advantage here is that it's all nice and consistent. However the
> disadvantage is that we have to add a totally useless read of the
> invocation id and use it as a relative index for the store. At least
> the nvidia shaders don't even have a way of writing other invocations'
> data even if they wanted to (without resorting to global memory
> accesses). So it's complicating all sorts of logic for apparently no
> real benefit.
> 
> Another approach might be to bypass the invocation id on storing the
> output, but using it on reads. For example code like
> 
> DCL OUT[0], GENERIC
> MOV OUT[0], TEMP[0]
> BARRIER
> MOV TEMP[0], OUT[3][0]
> 
> This avoids having to teach tgsi about 2d outputs (esp reladdr ones).
> This seems a lot simpler, but it ignores the gl_InvocationID indexing
> that happens when writing the output. However I don't think that's so
> bad. It also means that reads and writes are interpreted a little
> differently for OUT's, but that doesn't seem so bad either.
> 
> Thoughts?
> 

I think in the second case though it should be required to declare the
inputs separately. It sounds to me like at least on nv50 the access
works different in any case (even if the actual data accessed is the
same). Though I have no idea how other hw handles this, but in any case
hull shader from d3d11 uses 2d addressed inputs but 1d addressed outputs
too -
http://msdn.microsoft.com/en-us/library/windows/desktop/hh447211%28v=vs.85%29.aspx
(though I don't know how that looks like at the ddi level). Probably GL
used 2d outputs because it indeed looks more consistent (or perhaps some
extension could lift the restriction that only the current invocation be
written, though I'm not sure if that would ever make sense).
So I think if it doesn't actually make sense to try writing to other
outputs, option 2) makes more sense. I think though in this case the
outputs should probably be strictly write-only, I'd guess it would get
messy otherwise if you try to read some other invocations data vs.
reading back the current one.

But I don't really have much of an idea about tesselation, really.

Roland