Mesa (main): docs/freedreno: Rewrite the section on array access.

Thu Jun 3 03:28:07 UTC 2021

Module: Mesa
Branch: main
Commit: 3b19545966b771cf44a32f186351eacbcf30e89b
URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=3b19545966b771cf44a32f186351eacbcf30e89b

Author: Emma Anholt <emma at anholt.net>
Date:   Wed Jun  2 13:12:41 2021 -0700

docs/freedreno: Rewrite the section on array access.

We don't use collect/split for array access these days, instead use
ir3_array structs that the ir3_register can point to.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11147>

---

 docs/drivers/freedreno/ir3-notes.rst | 87 ++++++++++++------------------------
 1 file changed, 29 insertions(+), 58 deletions(-)

diff --git a/docs/drivers/freedreno/ir3-notes.rst b/docs/drivers/freedreno/ir3-notes.rst
index 91c1fe4c8cc..0d859813d5f 100644
--- a/docs/drivers/freedreno/ir3-notes.rst
+++ b/docs/drivers/freedreno/ir3-notes.rst
@@ -292,81 +292,52 @@ results in:
 
 The scheduling pass has some smarts to schedule things such that only a single ``a0.x`` value is used at any one time.
 
-To implement variable arrays, values are stored in consecutive scalar registers.  This has some overlap with `register groups`_, in that ``collect`` and ``split`` are used to help group things for the `register assignment`_ pass.
-
-To use a variable array as a src register, a slight variation of what is done for const array src.  The instruction src is a `collect` instruction that groups all the array members:
+To implement variable arrays, the NIR registers are stored as an ``ir3_array``,
+which will be register allocated to consecutive hardware registers.  The array
+access uses the id field in the ``ir3_register`` to map to the array being
+accessed, and the offset field for the fixed offset within the array.  A NIR
+indirect register read such as:
 
 ::
 
-  mova a0.x, hr1.y
-  sub r1.y, r2.x, r3.x
-  add r0.x, r1.y, r<a0.x + 2>
+  decl_reg vec2 32 r0[2]
+  ...
+  vec2 32 ssa_19 = mov r0[0 + ssa_9]
 
-results in:
 
-.. graphviz::
+results in:
 
-  digraph {
-    a0 [label="r0.z"];
-    a1 [label="r0.w"];
-    a2 [label="r1.x"];
-    a3 [label="r1.y"];
-    sub;
-    collect;
-    mova;
-    add;
-    add -> sub;
-    add -> collect [label="off=2"];
-    add -> mova;
-    collect -> a0;
-    collect -> a1;
-    collect -> a2;
-    collect -> a3;
-  }
+::
 
-TODO better describe how actual deref offset is derived, i.e. based on array base register.
+  0000:0000:001:  shl.b hssa_19, hssa_17, himm[0.000000,1,0x1]
+  0000:0000:002:  mov.s16s16 hr61.x, hssa_19
+  0000:0000:002:  mov.u32u32 ssa_21, arr[id=1, offset=0, size=4, ssa_12], address=_[0000:0000:002:  mov.s16s16]
+  0000:0000:002:  mov.u32u32 ssa_22, arr[id=1, offset=1, size=4, ssa_12], address=_[0000:0000:002:  mov.s16s16]
 
-To do an indirect write to a variable array, a ``split`` is used.  Say the array was assigned to registers ``r0.z`` through ``r1.y`` (hence the constant offset of 2):
 
-    Note that only cat1 (mov) can do indirect write.
+Array writes write to the array in ``instr->regs[0]->array.id``.  A NIR indirect
+register write such as:
 
 ::
 
-  mova a0.x, hr1.y
-  min r2.x, r2.x, c0.x
-  mov r<a0.x + 2>, r2.x
-  mul r0.x, r0.z, c0.z
-
-
-In this case, the ``mov`` instruction does not write all elements of the array (compared to usage of ``split`` for ``sam`` instructions in grouping_).  But the ``mov`` instruction does need an additional dependency (via ``collect``) on instructions that last wrote the array element members, to ensure that they get scheduled before the ``mov`` in scheduling_ stage (which also serves to group the array elements for the `register assignment`_ stage).
+  decl_reg vec2 32 r0[2]
+  ...
+  r0[0 + ssa_12] = mov ssa_13
 
-.. graphviz::
+results in:
 
-  digraph {
-    a0 [label="r0.z"];
-    a1 [label="r0.w"];
-    a2 [label="r1.x"];
-    a3 [label="r1.y"];
-    min;
-    mova;
-    mov;
-    mul;
-    split [label="split\noff=0"];
-    mul -> split;
-    split -> mov;
-    collect;
-    collect -> a0;
-    collect -> a1;
-    collect -> a2;
-    collect -> a3;
-    mov -> min;
-    mov -> mova;
-    mov -> collect;
-  }
+::
 
-Note that there would in fact be ``split`` nodes generated for each array element (although only the reachable ones will be scheduled, etc).
+  0000:0000:001:  shl.b hssa_29, hssa_27, himm[0.000000,1,0x1]
+  0000:0000:002:  mov.s16s16 hr61.x, hssa_29
+  0000:0000:001:  mov.u32u32 arr[id=1, offset=0, size=4, ssa_17], c2.y, address=_[0000:0000:002:  mov.s16s16]
+  0000:0000:004:  mov.u32u32 arr[id=1, offset=1, size=4, ssa_31], c2.z, address=_[0000:0000:002:  mov.s16s16]
 
+Note that only cat1 (mov) can do indirect write, and thus NIR register stores
+may need to introduce an extra mov.
 
+ir3 array accesses in the DAG get serialized by the ``instr->barrier_class`` and
+containing ``IR3_BARRIER_ARRAY_W`` or ``IR3_BARRIER_ARRAY_R``.
 
 Shader Passes
 -------------