[Mesa-dev] [PATCH 08/17] tgsi/ureg: add shared variables support for compute shaders

Wed Jan 27 02:35:46 PST 2016

On Wed, Jan 27, 2016 at 1:14 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> On Tue, Jan 26, 2016 at 7:05 PM, Marek Olšák <maraeo at gmail.com> wrote:
>> On Tue, Jan 26, 2016 at 9:48 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>> On Tue, Jan 26, 2016 at 3:23 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>>> On Tue, Jan 26, 2016 at 3:12 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>>> On Tue, Jan 26, 2016 at 8:57 AM, Marek Olšák <maraeo at gmail.com> wrote:
>>>>>> On Tue, Jan 26, 2016 at 2:25 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>>>>> I'd be fine with a new TGSI_FILE_MEMORY which provided options for
>>>>>>> shared, global, and local(/private?) memory. I believe the old
>>>>>>> TGSI_FILE_RESOURCE had support for these in a hacky way, this would be
>>>>>>> the clean way of doing it.
>>>>>>
>>>>>> I think they mean:
>>>>>> global = global shared memory
>>>>>> local = shared within a thread group (GL "shared memory")
>>>>>> private = ???
>>>>>
>>>>> memory that is local to a thread.
>>>>>
>>>>>>
>>>>>> ureg_DECL_local_temporary seems like a good match. I'd prefer to have
>>>>>> a separate file though.
>>>>>>
>>>>>> Shared memory is the same as TEMPs, except that they are TEMPs shared
>>>>>> within a thread group.
>>>>>
>>>>> It's much more of a memory area than a TEMP area though. TEMP's imply
>>>>> 16-byte wide stride for indirect indexing, etc -- not easy to work
>>>>> with.
>>>>
>>>> Yeah, TGSI implies that. If you want just loads, stores, and atomics
>>>> without declarations, you can add TGSI_FILE_SHARED. If drivers get
>>>> instructions with TGSI_FILE_SHARED in the resource operand, they can
>>>> just assume it's a load/store/atomic on the shared memory. For
>>>> example:
>>>>
>>>> LOAD TEMP[0], SHARED, address
>>>>
>>>> Note that I deliberately didn't type "SHARED[0]" to show that the
>>>> index should be ignored.
>>>>
>>>> In order to prevent confusion, please try to avoid using the word
>>>> "memory" without the word "shared", because shared memory is not a
>>>> typical memory resource. It's actually closer to registers in my
>>>> opinion.
>>>
>>> Hm... well on NVIDIA hardware it's accessed the same as any other
>>> memory area, like global memory, local memory, or constbufs. Same
>>> types of opcodes too... loads + stores. And GLSL/OpenCL want to be
>>> able to perform atomic operations on it too. Feels a lot more like
>>> memory to me than registers. But of course what are registers but
>>> super-mega-fast memory :)
>>>
>>> Sicne we eventually want to allow OpenCL TGSI to work, and from the
>>> looks of it, OpenCL *really* wants to just be able to pass in any
>>> number of pointers it feels like, and not be limited to N resources,
>>> we'll also need global and local (or private, as OpenCL calls it), in
>>> addition to shared.
>>>
>>> So I think a TGSI_FILE_MEMORY file would make sense, where you could
>>> specify in the declaration whether that specific one refers to
>>> global/shared/local somehow.
>>
>> Can you give me a TGSI example how it should look like?
>
> DECL MEMORY[0], GLOBAL
> DECL MEMORY[1], SHARED
> DECL MEMORY[2], LOCAL
> IMM[0] = { 0, 4, 8, 12 }
>
> LOAD TEMP[0], MEMORY[0], CONST[0].x // temp[0] = *(vec4 *)const[0].x
> LOAD TEMP[1], MEMORY[1], CONST[0].y // temp[1] = *(vec4 *)const[0].y
> STORE MEMORY[1], IMM[0].x, TEMP[0] // shared[0] = temp[0]
> ATOMUADD TEMP[2].x, MEMORY[1], IMM[0].y, IMM[0].y // temp[2].x =
> atomicAdd(shared[0].y, 4)
> STORE MEMORY[2].x, IMM[0].x, TEMP[2].x // stack[0].x = temp[2].x
>
> I'm using terminology a bit loosely here, but hopefully you get what
> I'm saying. Sort of like IN[] with semantics -- it has nothing to do
> with any input ordering, it's the semantics which provide meaning to
> the individual declarations.

Sounds good.

Marek