Mesa (master): docs: Add some documentation of game GL buffer object mapping behavior.

GitLab Mirror gitlab-mirror at kemper.freedesktop.org
Tue Mar 9 17:28:34 UTC 2021


Module: Mesa
Branch: master
Commit: a2a8c6a36c17ab9d8bb42d49437e7e0dab62bf75
URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=a2a8c6a36c17ab9d8bb42d49437e7e0dab62bf75

Author: Eric Anholt <eric at anholt.net>
Date:   Mon Feb 22 11:24:56 2021 -0800

docs: Add some documentation of game GL buffer object mapping behavior.

There are a variety of paths that apps take (this is by no means a
complete enumeration, I tried to keep going until I saw repeats but
eventually ran out of steam), and it should be useful to driver developers
writing their pipe_transfer_map() and invalidate_resource() calls to see a
bunch of the patterns without having to do performance debug on each app.

Acked-by: Rob Clark <robdclark at chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9231>

---

 docs/gallium/buffermapping.rst | 414 +++++++++++++++++++++++++++++++++++++++++
 docs/gallium/index.rst         |   1 +
 2 files changed, 415 insertions(+)

diff --git a/docs/gallium/buffermapping.rst b/docs/gallium/buffermapping.rst
new file mode 100644
index 00000000000..b876ad971e9
--- /dev/null
+++ b/docs/gallium/buffermapping.rst
@@ -0,0 +1,414 @@
+Buffer mapping patterns
+-----------------------
+
+There are two main strategies the driver has for CPU access to GL buffer
+objects. One is that the GL calls allocate temporary storage and blit to the GPU
+at
+``glBufferSubData()``/``glBufferData()``/``glFlushMappedBufferRange()``/``glUnmapBuffer()``
+time. This makes the behavior easily match. However, this may be more costly
+than direct mapping of the GL BO on some platforms, and is essentially not
+available to tiling GPUs (since tiling involves running through the command
+stream multiple times). Thus, GL has additional interfaces to help make it so
+apps can directly access memory while avoiding implicit blocking on the GPU
+rendering from those BOs.
+
+Rendering engines have a variety of knobs to set on those GL interfaces for data
+upload, and as a whole they seem to take just about every path available. Let's
+look at some examples to see how they might constrain GL driver buffer upload
+behavior.
+
+Portal 2
+========
+
+.. code-block:: console:
+
+  1030842 glXSwapBuffers(dpy = 0x82a8000, drawable = 20971540)
+  1030876 glBufferDataARB(target = GL_ELEMENT_ARRAY_BUFFER, size = 65536, data = NULL, usage = GL_DYNAMIC_DRAW)
+  1030877 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, size = 576, data = blob(576))
+  1030896 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 526, count = 252, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
+  1030915 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 19657, count = 36, type = GL_UNSIGNED_SHORT, indices = 0x1f8, basevertex = 0)
+  1030917 glBufferDataARB(target = GL_ARRAY_BUFFER, size = 1572864, data = NULL, usage = GL_DYNAMIC_DRAW)
+  1030918 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 128, data = blob(128))
+  1030919 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 576, size = 12, data = blob(12))
+  1030936 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x240, basevertex = 0)
+  1030937 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 128, size = 128, data = blob(128))
+  1030938 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 588, size = 12, data = blob(12))
+  1030940 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 4, end = 7, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x24c, basevertex = 0)
+  [... repeated draws at increasing offsets]
+  1033097 glXSwapBuffers(dpy = 0x82a8000, drawable = 20971540)
+
+From this sequence, we can see that it is important that the driver either
+implement ``glBufferSubData()`` as a blit from a streaming uploader in sequence with
+the ``glDraw*()`` calls (a common behavior for non-tiled GPUs, particularly those with
+dedicated memory), or that you:
+
+1) Track the valid range of the buffer so that you don't have to flush the draws
+   and synchronize on each following ``glBufferSubData()``.
+
+2) Reallocate the buffer storage on ``glBufferData`` so that your first
+   ``glBufferSubData()`` of the frame doesn't stall on the last frame's
+   rendering completing.
+
+You can't just empty your valid range on ``glBufferData()`` unless you know that
+the GPU access from the previous frame has completed. This pattern of
+incrementing ``glBufferSubData()`` offsets interleaved with draws from that data
+is common among newer Valve games.
+
+.. code-block:: console:
+  [ during setup ]
+
+  679259 glGenBuffersARB(n = 1, buffers = &1314)
+  679260 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
+  679261 glBufferDataARB(target = GL_ELEMENT_ARRAY_BUFFER, size = 3072, data = NULL, usage = GL_STATIC_DRAW)
+  679264 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 3072, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384000
+  679269 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 3072)
+  679270 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
+  
+  [... setup of other buffers on this binding point]
+
+  679343 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
+  679344 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384000
+  679346 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
+  679347 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
+  679348 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 768, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384300
+  679350 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
+  679351 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
+  679352 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 1536, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384600
+  679354 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
+  679355 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
+  679356 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 2304, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384900
+  679358 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
+  679359 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
+  
+  [... setup completes and we start drawing later]
+
+  761845 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
+  761846 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 323, count = 384, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
+
+This suggests that, for non-blitting drivers, resetting your "might be used on
+the GPU" range after a stall could save you a bunch of additional GPU stalls
+during setup.
+
+Terraria
+========
+
+.. code-block:: console:
+
+  167581 glXSwapBuffers(dpy = 0x3004630, drawable = 25165844)
+
+  167585 glBufferData(target = GL_ARRAY_BUFFER, size = 196608, data = NULL, usage = GL_STREAM_DRAW)
+  167586 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 1728, data = blob(1728))
+  167588 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 71, count = 108, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
+  167589 glBufferData(target = GL_ARRAY_BUFFER, size = 196608, data = NULL, usage = GL_STREAM_DRAW)
+  167590 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 27456, data = blob(27456))
+  167592 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 7, count = 12, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
+  167594 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 8)
+  167596 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 12)
+  [...]
+
+In this game, we can see ``glBufferData()`` being used on the same array buffer
+throughout, to get new storage so that the ``glBufferSubData()`` doesn't cause
+synchronization.
+
+Don't Starve
+============
+
+.. code-block:: console:
+
+  7251917 glGenBuffers(n = 1, buffers = &115052)
+  7251918 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115052)
+  7251919 glBufferData(target = GL_ARRAY_BUFFER, size = 144, data = blob(144), usage = GL_STREAM_DRAW)
+  7251921 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115052)
+  7251928 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
+  7251930 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 114872)
+  7251936 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 18)
+  7251938 glGenBuffers(n = 1, buffers = &115053)
+  7251939 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115053)
+  7251940 glBufferData(target = GL_ARRAY_BUFFER, size = 144, data = blob(144), usage = GL_STREAM_DRAW)
+  7251942 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115053)
+  7251949 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
+  7251973 glXSwapBuffers(dpy = 0x86dd860, drawable = 20971540)
+  [... drawing next frame]
+  7252388 glDeleteBuffers(n = 1, buffers = &115052)
+  7252389 glDeleteBuffers(n = 1, buffers = &115053)
+  7252390 glXSwapBuffers(dpy = 0x86dd860, drawable = 20971540)
+
+In this game we have a lot of tiny ``glBufferData()`` calls, suggesting that we
+could see working set wins and possibly CPU overhead reduction by packing small
+GL buffers in the same BO. Interestingly, the deletes of the temporary buffers
+always happen at the end of the next frame.
+
+Euro Truck Simulator
+====================
+
+.. code-block:: console:
+
+  [usage of VBO 14,15]
+  [...]
+  885199 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
+  885203 glInvalidateBufferData(buffer = 14)
+  885204 glInvalidateBufferData(buffer = 15)
+  [...]
+  889330 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
+  889334 glInvalidateBufferData(buffer = 12)
+  889335 glInvalidateBufferData(buffer = 16)
+  [...]
+  893461 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
+  893462 glClientWaitSync(sync = 0x77eee10, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
+  893463 glDeleteSync(sync = 0x780a630)
+  893464 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x78ec730
+  893465 glInvalidateBufferData(buffer = 13)
+  893466 glInvalidateBufferData(buffer = 17)
+  893505 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 14)
+  893506 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 0, length = 788, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034efd1000
+  893508 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
+  893509 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 15)
+  893510 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 0, length = 32, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034e5df000
+  893512 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
+  893532 glBindVertexBuffers(first = 0, count = 2, buffers = {10, 15}, offsets = {0, 0}, strides = {52, 16})
+  893552 glDrawElementsInstancedBaseVertex(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_SHORT, indices = 0x13f280, instancecount = 1, basevertex = 25131)
+  893609 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
+  893732 glBindVertexBuffers(first = 0, count = 1, buffers = &14, offsets = &0, strides = &48)
+  893733 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 14)
+  893744 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = 0xf0, basevertex = 0)
+  893759 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = 0x2e0, basevertex = 6)
+  893786 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 600, type = GL_UNSIGNED_SHORT, indices = 0xe87b0, basevertex = 21515)
+  893822 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
+  893845 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 14)
+  893846 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 788, length = 788, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034efd1314
+  893848 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
+  893886 glDrawElementsInstancedBaseVertex(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_SHORT, indices = 0x13f280, instancecount = 1, basevertex = 25131)
+  893943 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
+
+At the start of this frame, buffer 14 and 15 haven't been used in the previous 2
+frames, and the ``GL_ARB_sync`` fence has ensured that the GPU has at least started
+frame n-1 as the CPU starts the current frame. The first map is ``offset = 0,
+INVALIDATE_BUFFER | UNSYNCHRONIZED``, which suggests that the driver should
+reallocate storage for the mapping even in the ``UNSYNCHRONIZED`` case, except
+that the buffer is definitely going to be idle, making reallocation unnecessary
+(you may need to empty your valid range, though, to prevent unnecessary batch
+flushes).
+
+Also note the use of a totally unrelated binding point for the mapping of the
+vertex array -- you can't effectively use it as a hint for any buffer placement
+in memory. The game does also use ``glCopyBufferSubData()``, but only on a
+different buffer.
+
+
+Plague Inc
+==========
+
+.. code-block:: console:
+
+  1640732 glXSwapBuffers(dpy = 0xb218f20, drawable = 23068674)
+  1640733 glClientWaitSync(sync = 0xb4141430, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
+  1640734 glDeleteSync(sync = 0xb4141430)
+  1640735 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0xb4141430
+  
+  1640780 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 78)
+  1640787 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 79)
+  1640788 glDrawElements(mode = GL_TRIANGLES, count = 9636, type = GL_UNSIGNED_SHORT, indices = NULL)
+  1640795 glDrawElements(mode = GL_TRIANGLES, count = 9636, type = GL_UNSIGNED_SHORT, indices = NULL)
+  1640813 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
+  1640814 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 67584, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xbfef4000
+  1640815 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
+  1640816 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xc3998000
+  1640817 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
+  1640819 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 352)
+  1640820 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
+  1640821 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
+  1640823 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12)
+  1640824 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
+  1640825 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 1096)
+  1640831 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1091)
+  1640832 glDrawElements(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL)
+  
+  1640847 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
+  1640848 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 352, length = 67584, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xbfef4160
+  1640849 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
+  1640850 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 88, length = 12, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xc3998058
+  1640851 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
+  1640853 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 352)
+  1640854 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
+  1640855 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
+  1640857 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12)
+  1640858 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
+  1640863 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x58, basevertex = 4)
+
+At the start of this frame, the VBOs haven't been used in about 6 frames, and
+the ``GL_ARB_sync`` fence has ensured that the GPU has started frame n-1.
+
+Note the use of ``glFlushMappedBufferRange()`` on a small fraction of the size
+of the VBO -- it is important that a blitting driver make use of the flush
+ranges when in explicit mode.
+
+Darkest Dungeon
+===============
+
+.. code-block:: console:
+
+  938384 glXSwapBuffers(dpy = 0x377fcd0, drawable = 23068692)
+  
+  938385 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
+  938386 glBufferData(target = GL_ARRAY_BUFFER, size = 1048576, data = NULL, usage = GL_STREAM_DRAW)
+  938511 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
+  938512 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1048576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7a73fcaa7000
+  938514 glFlushMappedBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 512)
+  938515 glUnmapBuffer(target = GL_ARRAY_BUFFER) = GL_TRUE
+  938523 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1)
+  938524 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
+  938525 glDrawElements(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = NULL)
+  938527 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
+  938528 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1048576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7a73fcaa7000
+  938530 glFlushMappedBufferRange(target = GL_ARRAY_BUFFER, offset = 512, length = 512)
+  938531 glUnmapBuffer(target = GL_ARRAY_BUFFER) = GL_TRUE
+  938539 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1)
+  938540 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
+  938541 glDrawElements(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = 0x30)
+  [... more maps and draws at increasing offsets]
+
+Interesting note for this game, after the initial ``glBufferData()`` in the
+frame to reallocate the storage, it unsync maps the whole buffer each time, and
+just changes which region it flushes. The same GL buffer name is used in every
+frame.
+
+Tabletop Simulator
+==================
+
+.. code-block:: console:
+
+  1287594 glXSwapBuffers(dpy = 0x3e10810, drawable = 23068692)
+  1287595 glClientWaitSync(sync = 0x7abf554e37b0, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
+  1287596 glDeleteSync(sync = 0x7abf554e37b0)
+  1287597 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x7abf56647490
+  
+  1287614 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 480)
+  1287615 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 384, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7abf2e79a000
+  1287642 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 614)
+  1287650 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 5)
+  1287651 glBufferSubData(target = GL_COPY_WRITE_BUFFER, offset = 0, size = 1088, data = blob(1088))
+  1287652 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 615)
+  1287653 glDrawElements(mode = GL_TRIANGLES, count = 1788, type = GL_UNSIGNED_SHORT, indices = NULL)
+  [... more draw calls]
+  1289055 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 480)
+  1289057 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 384)
+  1289058 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
+  1289059 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 480)
+  1289066 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 12, count = 4)
+  1289068 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 8, count = 4)
+  1289553 glXSwapBuffers(dpy = 0x3e10810, drawable = 23068692)
+
+In this app, buffer 480 gets used like this every other frame.  The ``GL_ARB_sync``
+fence ensures that frame n-1 has started on the GPU before CPU work starts on
+the current frame, so the unsynchronized access to the buffers is safe.
+
+Hollow Knight
+=============
+
+.. code-block:: console:
+
+  1873034 glXSwapBuffers(dpy = 0x28609d0, drawable = 23068692)
+  1873035 glClientWaitSync(sync = 0x7b1a5ca6e130, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
+  1873036 glDeleteSync(sync = 0x7b1a5ca6e130)
+  1873037 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x7b1a5ca6e130
+  1873038 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
+  1873039 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 8640, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a04c7e000
+  1873040 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
+  1873041 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 720, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a07430000
+  1873065 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
+  1873067 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 8640)
+  1873068 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
+  1873069 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
+  1873071 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 720)
+  1873072 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
+  1873073 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
+  1873074 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 8640, length = 576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a04c801c0
+  1873075 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
+  1873076 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 720, length = 72, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a074302d0
+  1873077 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
+  1873079 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 576)
+  1873080 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
+  1873081 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
+  1873083 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 72)
+  1873084 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
+  1873085 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 29)
+  1873096 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 30)
+  1873097 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 36, type = GL_UNSIGNED_SHORT, indices = 0x2d0, basevertex = 240)
+
+In this app, buffer 29/30 get used like this starting from offset 0 every other
+frame.  The ``GL_ARB_sync`` fence is used to make sure that the GPU has reached the
+start of the previous frame before we go unsynchronized writing over the n-2
+frame's buffer.
+
+Borderlands 2
+=============
+
+.. code-block:: console:
+
+  3561998 glFlush()
+  3562004 glXSwapBuffers(dpy = 0xbaf0f90, drawable = 23068705)
+  3562006 glClientWaitSync(sync = 0x231c2ab0, flags = GL_SYNC_FLUSH_COMMANDS_BIT, timeout = 10000000000) = GL_ALREADY_SIGNALED
+  3562007 glDeleteSync(sync = 0x231c2ab0)
+  3562008 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x231aadc0
+  
+  3562050 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1193)
+  3562051 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1792, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT) = 0xde056000
+  3562053 glUnmapBufferARB(target = GL_ARRAY_BUFFER) = GL_TRUE
+  3562054 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1194)
+  3562055 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1280, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT) = 0xd9426000
+  3562057 glUnmapBufferARB(target = GL_ARRAY_BUFFER) = GL_TRUE
+  [... unrelated draws]
+  3563051 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1193)
+  3563064 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 875)
+  3563065 glDrawElementsInstancedARB(mode = GL_TRIANGLES, count = 72, type = GL_UNSIGNED_SHORT, indices = NULL, instancecount = 28)
+
+The ``GL_ARB_sync`` fence ensures that the GPU has started frame n-1 before the CPU
+starts on the current frame.
+
+This sequence of buffer uploads appears in each frame with the same buffer
+names, so you do need to handle the ``GL_MAP_INVALIDATE_BUFFER_BIT`` as a
+reallocate if the buffer is GPU-busy (it wasn't in this trace capture) to avoid
+stalls on the n-1 frame completing.
+
+Note that this is just one small buffer. Most of the vertex data goes through a
+``glBufferSubData()``/``glDraw*()`` path with the VBO used across multiple
+frames, with a ``glBufferData()`` when needing to wrap.
+
+Buffer mapping conclusions
+--------------------------
+
+* Non-blitting drivers must track the valid range of a freshly allocated buffer
+  as it gets uploaded in ``pipe_transfer_map()`` and avoid stalling on the GPU
+  when mapping an undefined portion of the buffer when ``glBufferSubData()`` is
+  interleaved with drawing.
+
+* Non-blitting drivers must reallocate storage on ``glBufferData(NULL)`` so that
+  the following ``glBufferSubData()`` won't stall. That ``glBufferData(NULL)``
+  call will appear in the driver as an ``invalidate_resource()`` call if
+  ``PIPE_CAP_INVALIDATE_BUFFER`` is available. (If that flag is not set, then
+  mesa/st will create a new pipe_resource for you). Storage reallocation may be
+  skipped if you for some reason know that the buffer is idle, in which case you
+  can just empty the valid region.
+
+* Blitting drivers must use the ``transfer_flush_region()`` region
+  instead of the mapped range when ``PIPE_MAP_FLUSH_EXPLICIT`` is set, to avoid
+  blitting too much data. (When that bit is unset, you just blit the whole
+  mapped range at unmap time.)
+
+* Buffer valid range tracking in non-blitting drivers must use the
+  ``transfer_flush_region()`` region instead of the mapped range when
+  ``PIPE_MAP_FLUSH_EXPLICIT`` is set, to avoid excess stalls.
+
+* Buffer valid range tracking doesn't need to be fancy, "number of bytes
+  valid starting from 0" is sufficient for all examples found.
+
+* Use the ``pipe_debug_callback`` to report stalls on buffer mapping to ease
+  debug.
+
+* Buffer binding points are not useful for tuning buffer placement (See all the
+  ``PIPE_COPY_WRITE_BUFFER`` instances), you have to track the actual usage
+  history of a GL BO name.  mesa/st does this for optimizing its state updates
+  on reallocation in the ``!PIPE_CAP_INVALIDATE_BUFFER`` case, and if you set
+  ``PIPE_CAP_INVALIDATE_BUFFER`` then you have to flag your own internal state
+  updates (VBO addresses, XFB addresses, texture buffer addresses, etc.) on
+  reallocation based on usage history.
diff --git a/docs/gallium/index.rst b/docs/gallium/index.rst
index 82dae274a54..6656bd6cf13 100644
--- a/docs/gallium/index.rst
+++ b/docs/gallium/index.rst
@@ -14,6 +14,7 @@ Contents:
    format
    context
    cso
+   buffermapping
    distro
    postprocess
    glossary



More information about the mesa-commit mailing list