[Mesa-dev] [PATCH 2/8] gallium/tgsi: start adding hw atomics (v3)

Nicolai Hähnle nhaehnle at gmail.com
Tue Nov 7 17:26:22 UTC 2017


On 07.11.2017 17:57, Marek Olšák wrote:
> With HW atomic counters, MaxAtomicBufferSize is a pretty small number
> (counters * 4). TGSI has maximum index = 32K.

Ah, you're right.

Patches 1-7:

Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>


> 
> Marek
> 
> On Tue, Nov 7, 2017 at 5:43 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>> On 07.11.2017 17:25, Nicolai Hähnle wrote:
>>>
>>> On 07.11.2017 07:31, Dave Airlie wrote:
>>>>
>>>> diff --git a/src/gallium/docs/source/tgsi.rst
>>>> b/src/gallium/docs/source/tgsi.rst
>>>> index 1a51fe9..0c331f2 100644
>>>> --- a/src/gallium/docs/source/tgsi.rst
>>>> +++ b/src/gallium/docs/source/tgsi.rst
>>>> @@ -2638,9 +2638,11 @@ logical operations.  In this context atomicity
>>>> means that another
>>>>    concurrent memory access operation that affects the same memory
>>>>    location is guaranteed to be performed strictly before or after the
>>>>    entire execution of the atomic operation. The resource may be a BUFFER,
>>>> -IMAGE, or MEMORY.  In the case of an image, the offset works the same as
>>>> for
>>>> -``LOAD`` and ``STORE``, specified above. These atomic operations may
>>>> -only be used with 32-bit integer image formats.
>>>> +IMAGE, ATOMIC, or MEMORY.  In the case of an image, the offset works
>>>> +the same as for ``LOAD`` and ``STORE``, specified above. For atomic
>>>> +counters, the offset is an immediate index to the base hw atomic
>>>> +counter for this operation.
>>>> +These atomic operations may only be used with 32-bit integer image
>>>> formats.
>>>>    .. opcode:: ATOMUADD - Atomic integer addition
>>>> @@ -3440,7 +3442,6 @@ TGSI_SEMANTIC_SUBGROUP_LT_MASK
>>>>    A bit mask of ``bit index < TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
>>>>    ``(1 << subgroup_invocation) - 1`` in arbitrary precision arithmetic.
>>>> -
>>>
>>>
>>> Stray whitespace change.
>>>
>>>
>>>>    Declaration Interpolate
>>>>    ^^^^^^^^^^^^^^^^^^^^^^^
>>>> @@ -3517,6 +3518,31 @@ accessing a misaligned address is undefined.
>>>>    Usage of the STORE opcode is only allowed if the WR (writable) flag
>>>>    is set.
>>>> +Hardware Atomic Register File
>>>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> +
>>>> +Hardware atomics are declared as a 2D array with an optional array id.
>>>> +
>>>> +The first member of the dimension is the buffer resource the atomic
>>>> +is located in.
>>>> +The second member is a range into the buffer resource, either for
>>>> +one or multiple counters. If this is an array, the declaration will have
>>>> +an unique array id.
>>>> +
>>>> +Each counter is 4 bytes in size, and index and ranges are in counters
>>>> not bytes.
>>>> +DCL ATOMIC[0][0]
>>>> +DCL ATOMIC[0][1]
>>>> +
>>>> +This declares two atomics, one at the start of the buffer and one in the
>>>> +second 4 bytes.
>>>> +
>>>> +DCL ATOMIC[0][0]
>>>> +DCL ATOMIC[1][0]
>>>> +DCL ATOMIC[1][1..3], ARRAY(1)
>>>> +
>>>> +This declares 5 atomics, one in buffer 0 at 0,
>>>> +one in buffer 1 at 0, and an array of 3 atomics in
>>>> +the buffer 1, starting at 1.
>>>
>>>
>>> My understanding is that these ranges could be highly non-contiguous,
>>> right? I.e., you could have
>>>
>>> DCL ATOMIC[0][15]
>>> DCL ATOMIC[0][8423..8430], ARRAY(1)
>>> DCL ATOMIC[0][25112]
>>>
>>> ... corresponding to the offsets in the GLSL shader. The doc should really
>>> point this out explicitly. Also, this might cause trouble because the TGSI
>>> range tokens don't have enough bits to represent high offsets.
>>
>>
>> Thinking about it some more, here's one way to deal with it. Have
>> st_glsl_to_tgsi pack the the indices (by keeping track of the number of
>> counters per atomic counter buffer), and then add an "atomic counter offset"
>> dword to the TGSI. The decls above could become:
>>
>>    DCL ATOMIC[0][0], OFFSET(25112)
>>    DCL ATOMIC[0][1..8], OFFSET(8423), ARRAY(1)
>>    DCL ATOMIC[0][9], OFFSET(15)
>>
>> (the point is the order of TGSI indices doesn't matter outside of counter
>> arrays)
>>
>> The driver would then compute the number of counters per HW atomic counter
>> buffer to assign on-chip HW atomic memory during compile time, and then copy
>> memory from the given offsets to the on-chip HW atomic memory as part of
>> state validation before draws and dispatches.
>>
>> Cheers,
>> Nicolai
>>
>> --
>> Lerne, wie die Welt wirklich ist,
>> Aber vergiss niemals, wie sie sein sollte.
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.


More information about the mesa-dev mailing list