[Mesa-dev] [PATCH 08/17] tgsi/ureg: add shared variables support for compute shaders

Tue Jan 26 16:14:25 PST 2016

On Tue, Jan 26, 2016 at 7:05 PM, Marek Olšák <maraeo at gmail.com> wrote:
> On Tue, Jan 26, 2016 at 9:48 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> On Tue, Jan 26, 2016 at 3:23 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>> On Tue, Jan 26, 2016 at 3:12 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>> On Tue, Jan 26, 2016 at 8:57 AM, Marek Olšák <maraeo at gmail.com> wrote:
>>>>> On Tue, Jan 26, 2016 at 2:25 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>>>> I'd be fine with a new TGSI_FILE_MEMORY which provided options for
>>>>>> shared, global, and local(/private?) memory. I believe the old
>>>>>> TGSI_FILE_RESOURCE had support for these in a hacky way, this would be
>>>>>> the clean way of doing it.
>>>>>
>>>>> I think they mean:
>>>>> global = global shared memory
>>>>> local = shared within a thread group (GL "shared memory")
>>>>> private = ???
>>>>
>>>> memory that is local to a thread.
>>>>
>>>>>
>>>>> ureg_DECL_local_temporary seems like a good match. I'd prefer to have
>>>>> a separate file though.
>>>>>
>>>>> Shared memory is the same as TEMPs, except that they are TEMPs shared
>>>>> within a thread group.
>>>>
>>>> It's much more of a memory area than a TEMP area though. TEMP's imply
>>>> 16-byte wide stride for indirect indexing, etc -- not easy to work
>>>> with.
>>>
>>> Yeah, TGSI implies that. If you want just loads, stores, and atomics
>>> without declarations, you can add TGSI_FILE_SHARED. If drivers get
>>> instructions with TGSI_FILE_SHARED in the resource operand, they can
>>> just assume it's a load/store/atomic on the shared memory. For
>>> example:
>>>
>>> LOAD TEMP[0], SHARED, address
>>>
>>> Note that I deliberately didn't type "SHARED[0]" to show that the
>>> index should be ignored.
>>>
>>> In order to prevent confusion, please try to avoid using the word
>>> "memory" without the word "shared", because shared memory is not a
>>> typical memory resource. It's actually closer to registers in my
>>> opinion.
>>
>> Hm... well on NVIDIA hardware it's accessed the same as any other
>> memory area, like global memory, local memory, or constbufs. Same
>> types of opcodes too... loads + stores. And GLSL/OpenCL want to be
>> able to perform atomic operations on it too. Feels a lot more like
>> memory to me than registers. But of course what are registers but
>> super-mega-fast memory :)
>>
>> Sicne we eventually want to allow OpenCL TGSI to work, and from the
>> looks of it, OpenCL *really* wants to just be able to pass in any
>> number of pointers it feels like, and not be limited to N resources,
>> we'll also need global and local (or private, as OpenCL calls it), in
>> addition to shared.
>>
>> So I think a TGSI_FILE_MEMORY file would make sense, where you could
>> specify in the declaration whether that specific one refers to
>> global/shared/local somehow.
>
> Can you give me a TGSI example how it should look like?

DECL MEMORY[0], GLOBAL
DECL MEMORY[1], SHARED
DECL MEMORY[2], LOCAL
IMM[0] = { 0, 4, 8, 12 }

LOAD TEMP[0], MEMORY[0], CONST[0].x // temp[0] = *(vec4 *)const[0].x
LOAD TEMP[1], MEMORY[1], CONST[0].y // temp[1] = *(vec4 *)const[0].y
STORE MEMORY[1], IMM[0].x, TEMP[0] // shared[0] = temp[0]
ATOMUADD TEMP[2].x, MEMORY[1], IMM[0].y, IMM[0].y // temp[2].x =
atomicAdd(shared[0].y, 4)
STORE MEMORY[2].x, IMM[0].x, TEMP[2].x // stack[0].x = temp[2].x

I'm using terminology a bit loosely here, but hopefully you get what
I'm saying. Sort of like IN[] with semantics -- it has nothing to do
with any input ordering, it's the semantics which provide meaning to
the individual declarations.

  -ilia