[Mesa-dev] TGSI and Tessellation Control Shader outputs

Ilia Mirkin imirkin at alum.mit.edu
Mon Sep 1 09:19:36 PDT 2014


On Mon, Sep 1, 2014 at 12:00 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 29.08.2014 22:44, schrieb Ilia Mirkin:
>> Hello,
>>
>> I've been thinking a bit about how to properly implement TCS outputs
>> in TGSI. As a quick reminder, there are per-vertex (i.e. invocation)
>> and per-patch outputs in TCS. And while you can only write to the
>> current invocation's per-vertex outputs, you can read from any of
>> them. (With barrier() used to synchronize invocations.)
>>
>> Per-patch outputs map quite nicely onto the existing infrastructure,
>> so the rest of the questions will be about per-vertex outputs.
>>
>> One can represent per-vertex outputs as 2D output arrays. That means
>> support for them needs to be added all over (which I've actually done,
>> so I'm not complaining about the extra work but rather asking if it's
>> a good idea). And then you might have
>>
>> DCL OUT[][0], GENERIC
>> MOV ADDR[1].x, SV[0] /* invocation id */
>> MOV OUT[ADDR[1].x][0], TEMP[0] /* store value */
>> BARRIER
>> MOV TEMP[0], OUT[3][0] /* read output from invocation == 3 */
>>
>> The advantage here is that it's all nice and consistent. However the
>> disadvantage is that we have to add a totally useless read of the
>> invocation id and use it as a relative index for the store. At least
>> the nvidia shaders don't even have a way of writing other invocations'
>> data even if they wanted to (without resorting to global memory
>> accesses). So it's complicating all sorts of logic for apparently no
>> real benefit.
>>
>> Another approach might be to bypass the invocation id on storing the
>> output, but using it on reads. For example code like
>>
>> DCL OUT[0], GENERIC
>> MOV OUT[0], TEMP[0]
>> BARRIER
>> MOV TEMP[0], OUT[3][0]
>>
>> This avoids having to teach tgsi about 2d outputs (esp reladdr ones).
>> This seems a lot simpler, but it ignores the gl_InvocationID indexing
>> that happens when writing the output. However I don't think that's so
>> bad. It also means that reads and writes are interpreted a little
>> differently for OUT's, but that doesn't seem so bad either.
>>
>> Thoughts?
>>
>
> I think in the second case though it should be required to declare the
> inputs separately. It sounds to me like at least on nv50 the access
> works different in any case (even if the actual data accessed is the
> same). Though I have no idea how other hw handles this, but in any case

On nvc0 there are load and store instructions (nv50 is a little
different, but it also doesn't support tess). When storing, there's no
way to provide it the invocation offset. When loading, there is.

> hull shader from d3d11 uses 2d addressed inputs but 1d addressed outputs
> too -
> http://msdn.microsoft.com/en-us/library/windows/desktop/hh447211%28v=vs.85%29.aspx
> (though I don't know how that looks like at the ddi level). Probably GL

Hmmm... well from a quick read of it, they've bypassed this problem by
creating substages with inputs consuming previous stages' outputs.

> used 2d outputs because it indeed looks more consistent (or perhaps some
> extension could lift the restriction that only the current invocation be
> written, though I'm not sure if that would ever make sense).
> So I think if it doesn't actually make sense to try writing to other
> outputs, option 2) makes more sense. I think though in this case the
> outputs should probably be strictly write-only, I'd guess it would get
> messy otherwise if you try to read some other invocations data vs.
> reading back the current one.

If they were write-only, how would you read another invocation's
outputs? Or are you suggesting that some new input type be used which
maps onto the invocations' outputs?

  -ilia


More information about the mesa-dev mailing list