[Mesa-dev] [PATCH 2/8] gallium/tgsi: start adding hw atomics (v3)

Tue Nov 7 16:43:00 UTC 2017

On 07.11.2017 17:25, Nicolai Hähnle wrote:
> On 07.11.2017 07:31, Dave Airlie wrote:
>> diff --git a/src/gallium/docs/source/tgsi.rst 
>> b/src/gallium/docs/source/tgsi.rst
>> index 1a51fe9..0c331f2 100644
>> --- a/src/gallium/docs/source/tgsi.rst
>> +++ b/src/gallium/docs/source/tgsi.rst
>> @@ -2638,9 +2638,11 @@ logical operations.  In this context atomicity 
>> means that another
>>   concurrent memory access operation that affects the same memory
>>   location is guaranteed to be performed strictly before or after the
>>   entire execution of the atomic operation. The resource may be a BUFFER,
>> -IMAGE, or MEMORY.  In the case of an image, the offset works the same 
>> as for
>> -``LOAD`` and ``STORE``, specified above. These atomic operations may
>> -only be used with 32-bit integer image formats.
>> +IMAGE, ATOMIC, or MEMORY.  In the case of an image, the offset works
>> +the same as for ``LOAD`` and ``STORE``, specified above. For atomic
>> +counters, the offset is an immediate index to the base hw atomic
>> +counter for this operation.
>> +These atomic operations may only be used with 32-bit integer image 
>> formats.
>>   .. opcode:: ATOMUADD - Atomic integer addition
>> @@ -3440,7 +3442,6 @@ TGSI_SEMANTIC_SUBGROUP_LT_MASK
>>   A bit mask of ``bit index < TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
>>   ``(1 << subgroup_invocation) - 1`` in arbitrary precision arithmetic.
>> -
> 
> Stray whitespace change.
> 
> 
>>   Declaration Interpolate
>>   ^^^^^^^^^^^^^^^^^^^^^^^
>> @@ -3517,6 +3518,31 @@ accessing a misaligned address is undefined.
>>   Usage of the STORE opcode is only allowed if the WR (writable) flag
>>   is set.
>> +Hardware Atomic Register File
>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +Hardware atomics are declared as a 2D array with an optional array id.
>> +
>> +The first member of the dimension is the buffer resource the atomic
>> +is located in.
>> +The second member is a range into the buffer resource, either for
>> +one or multiple counters. If this is an array, the declaration will have
>> +an unique array id.
>> +
>> +Each counter is 4 bytes in size, and index and ranges are in counters 
>> not bytes.
>> +DCL ATOMIC[0][0]
>> +DCL ATOMIC[0][1]
>> +
>> +This declares two atomics, one at the start of the buffer and one in the
>> +second 4 bytes.
>> +
>> +DCL ATOMIC[0][0]
>> +DCL ATOMIC[1][0]
>> +DCL ATOMIC[1][1..3], ARRAY(1)
>> +
>> +This declares 5 atomics, one in buffer 0 at 0,
>> +one in buffer 1 at 0, and an array of 3 atomics in
>> +the buffer 1, starting at 1.
> 
> My understanding is that these ranges could be highly non-contiguous, 
> right? I.e., you could have
> 
> DCL ATOMIC[0][15]
> DCL ATOMIC[0][8423..8430], ARRAY(1)
> DCL ATOMIC[0][25112]
> 
> ... corresponding to the offsets in the GLSL shader. The doc should 
> really point this out explicitly. Also, this might cause trouble because 
> the TGSI range tokens don't have enough bits to represent high offsets.

Thinking about it some more, here's one way to deal with it. Have 
st_glsl_to_tgsi pack the the indices (by keeping track of the number of 
counters per atomic counter buffer), and then add an "atomic counter 
offset" dword to the TGSI. The decls above could become:

   DCL ATOMIC[0][0], OFFSET(25112)
   DCL ATOMIC[0][1..8], OFFSET(8423), ARRAY(1)
   DCL ATOMIC[0][9], OFFSET(15)

(the point is the order of TGSI indices doesn't matter outside of 
counter arrays)

The driver would then compute the number of counters per HW atomic 
counter buffer to assign on-chip HW atomic memory during compile time, 
and then copy memory from the given offsets to the on-chip HW atomic 
memory as part of state validation before draws and dispatches.

Cheers,
Nicolai
-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.