[Mesa-dev] [PATCH 2/8] gallium/tgsi: start adding hw atomics (v3)

Nicolai Hähnle nhaehnle at gmail.com
Tue Nov 7 17:37:17 UTC 2017


On 07.11.2017 18:26, Nicolai Hähnle wrote:
> On 07.11.2017 17:57, Marek Olšák wrote:
>> With HW atomic counters, MaxAtomicBufferSize is a pretty small number
>> (counters * 4). TGSI has maximum index = 32K.
> 
> Ah, you're right.

I forgot: the other comments (about the assertion in patch 2, and about 
non-contiguous buffers in patch 5 -- which for some reason didn't get 
sent before) still stand.


> 
> Patches 1-7:
> 
> Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
> 
> 
>>
>> Marek
>>
>> On Tue, Nov 7, 2017 at 5:43 PM, Nicolai Hähnle <nhaehnle at gmail.com> 
>> wrote:
>>> On 07.11.2017 17:25, Nicolai Hähnle wrote:
>>>>
>>>> On 07.11.2017 07:31, Dave Airlie wrote:
>>>>>
>>>>> diff --git a/src/gallium/docs/source/tgsi.rst
>>>>> b/src/gallium/docs/source/tgsi.rst
>>>>> index 1a51fe9..0c331f2 100644
>>>>> --- a/src/gallium/docs/source/tgsi.rst
>>>>> +++ b/src/gallium/docs/source/tgsi.rst
>>>>> @@ -2638,9 +2638,11 @@ logical operations.  In this context atomicity
>>>>> means that another
>>>>>    concurrent memory access operation that affects the same memory
>>>>>    location is guaranteed to be performed strictly before or after the
>>>>>    entire execution of the atomic operation. The resource may be a 
>>>>> BUFFER,
>>>>> -IMAGE, or MEMORY.  In the case of an image, the offset works the 
>>>>> same as
>>>>> for
>>>>> -``LOAD`` and ``STORE``, specified above. These atomic operations may
>>>>> -only be used with 32-bit integer image formats.
>>>>> +IMAGE, ATOMIC, or MEMORY.  In the case of an image, the offset works
>>>>> +the same as for ``LOAD`` and ``STORE``, specified above. For atomic
>>>>> +counters, the offset is an immediate index to the base hw atomic
>>>>> +counter for this operation.
>>>>> +These atomic operations may only be used with 32-bit integer image
>>>>> formats.
>>>>>    .. opcode:: ATOMUADD - Atomic integer addition
>>>>> @@ -3440,7 +3442,6 @@ TGSI_SEMANTIC_SUBGROUP_LT_MASK
>>>>>    A bit mask of ``bit index < TGSI_SEMANTIC_SUBGROUP_INVOCATION``, 
>>>>> i.e.
>>>>>    ``(1 << subgroup_invocation) - 1`` in arbitrary precision 
>>>>> arithmetic.
>>>>> -
>>>>
>>>>
>>>> Stray whitespace change.
>>>>
>>>>
>>>>>    Declaration Interpolate
>>>>>    ^^^^^^^^^^^^^^^^^^^^^^^
>>>>> @@ -3517,6 +3518,31 @@ accessing a misaligned address is undefined.
>>>>>    Usage of the STORE opcode is only allowed if the WR (writable) flag
>>>>>    is set.
>>>>> +Hardware Atomic Register File
>>>>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> +
>>>>> +Hardware atomics are declared as a 2D array with an optional array 
>>>>> id.
>>>>> +
>>>>> +The first member of the dimension is the buffer resource the atomic
>>>>> +is located in.
>>>>> +The second member is a range into the buffer resource, either for
>>>>> +one or multiple counters. If this is an array, the declaration 
>>>>> will have
>>>>> +an unique array id.
>>>>> +
>>>>> +Each counter is 4 bytes in size, and index and ranges are in counters
>>>>> not bytes.
>>>>> +DCL ATOMIC[0][0]
>>>>> +DCL ATOMIC[0][1]
>>>>> +
>>>>> +This declares two atomics, one at the start of the buffer and one 
>>>>> in the
>>>>> +second 4 bytes.
>>>>> +
>>>>> +DCL ATOMIC[0][0]
>>>>> +DCL ATOMIC[1][0]
>>>>> +DCL ATOMIC[1][1..3], ARRAY(1)
>>>>> +
>>>>> +This declares 5 atomics, one in buffer 0 at 0,
>>>>> +one in buffer 1 at 0, and an array of 3 atomics in
>>>>> +the buffer 1, starting at 1.
>>>>
>>>>
>>>> My understanding is that these ranges could be highly non-contiguous,
>>>> right? I.e., you could have
>>>>
>>>> DCL ATOMIC[0][15]
>>>> DCL ATOMIC[0][8423..8430], ARRAY(1)
>>>> DCL ATOMIC[0][25112]
>>>>
>>>> ... corresponding to the offsets in the GLSL shader. The doc should 
>>>> really
>>>> point this out explicitly. Also, this might cause trouble because 
>>>> the TGSI
>>>> range tokens don't have enough bits to represent high offsets.
>>>
>>>
>>> Thinking about it some more, here's one way to deal with it. Have
>>> st_glsl_to_tgsi pack the the indices (by keeping track of the number of
>>> counters per atomic counter buffer), and then add an "atomic counter 
>>> offset"
>>> dword to the TGSI. The decls above could become:
>>>
>>>    DCL ATOMIC[0][0], OFFSET(25112)
>>>    DCL ATOMIC[0][1..8], OFFSET(8423), ARRAY(1)
>>>    DCL ATOMIC[0][9], OFFSET(15)
>>>
>>> (the point is the order of TGSI indices doesn't matter outside of 
>>> counter
>>> arrays)
>>>
>>> The driver would then compute the number of counters per HW atomic 
>>> counter
>>> buffer to assign on-chip HW atomic memory during compile time, and 
>>> then copy
>>> memory from the given offsets to the on-chip HW atomic memory as part of
>>> state validation before draws and dispatches.
>>>
>>> Cheers,
>>> Nicolai
>>>
>>> -- 
>>> Lerne, wie die Welt wirklich ist,
>>> Aber vergiss niemals, wie sie sein sollte.
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 


-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.


More information about the mesa-dev mailing list