[Mesa-dev] [PATCH 00/84] Introduce gallium nine internal multithreading

Axel Davy axel.davy at ens.fr
Wed Dec 7 22:54:33 UTC 2016

This patch adds internal multithreading to gallium nine.

The goal is to offload almost all gallium nine calls (and some other
work) to a worker thread.

The patch serie does first a lot of refactoring, and introduces a new
nine_context structure containing all the required internal states
to do the gallium calls.
It will be the structure used exclusively by the worker thread.
The pipe_context is exclusive to the worker thread, and the main
thread needs special functions to access it, which either wait
for all pending commands to execute, or pause the thread.
A secondary pipe_context is also introduced for operations
that don't need implicit synchronization with rendering (buffer
upload with DISCARD/NOOVERWRITE in particular).

To maximize performance, the commands are queued into preallocated queues,
and the queues are made visible to the worker thread only when a significant
amount of commands are queued. To be a performance gain, this requires waiting
on the worker thread to finish its job to be very rare.
With all the patches of the serie, synchronization basically happens only at:
. surface/volume destruction if their content is in RAM and was needed for a pending command.
. Buffer lock not using MANAGED pool or DISCARD/NOOVERWRITE
. Surface/volume lock very close to a previous lock.
^^^ Usually these cases only happen at the beginning of scene, where items are initialized, and
do not happen in a normal frame.
. At the end of a frame, when Present() is called.

Thus basically for the great majority of games, the only moment we really wait for the
worker thread to finish its job is when all frame commands have been sent.

Because we require driver thread safety (pipe_screen commands can be made in the main
thread while the pipe_context is used in the worker thread), internal multithreading
(dubbed CSMT, in reference to wine ogl internal multithreading mode) is enabled by
default only on r600/radeonsi, but can be forced on/off via a setting (csmt_force=0 or 1).

One thing the patchset could improve is stateblocks handling.
The function to apply them to the nine_context can be optimized,
and overhead can be reduced. Most games don't use stateblocks though.

How does this compare to wine CSMT ?

I haven't looked much at the details of wine CSMT.
My understanding is that opengl calls are offloaded to
a worker thread. I don't know exactly which optimizations
are done to avoid waiting on the worker thread.

How does this compare to Windows internal multithreading ?

The public direct3D DDI documentation gives some indications
on how works the multithreading. Some tests can also be made to
deduce some information. Basically most commands are said to be
put into a worker thread, while buffer locks and check for query results
are made with reentrant functions in the main thread.
Some tests suggest MANAGED pool upload is done in the worker thread,
like the gallium patch serie implements.
Thus we expect performance to be comparable.

Axel Davy (75):
  st/nine: Introduce nine_context
  st/nine: Move core of device clear to nine_state
  st/nine: Move draw calls to nine_state
  st/nine: Track changed.texture only for stateblocks
  st/nine: Move texture setting to nine_context_*
  st/nine: Back textures into nine_context
  st/nine: Move stream_usage_mask to nine_context
  st/nine: Move vtxbuf to nine_context
  st/nine: Move stream freq data to nine_context
  st/nine: Back vdecl to nine_context
  st/nine: Back vs to nine_context
  st/nine: Back sampler states to nine_context
  st/nine: Back all shader constants to nine_context
  st/nine: Back current index buffer to nine_context
  st/nine: Back RT to nine_context
  st/nine: Back scissor to nine_context
  st/nine: Back viewport to nine_context
  st/nine: Put ff data in a separate structure
  st/nine: Refactor SetLight
  st/nine: Refactor LightEnable
  st/nine: Back all ff states in nine_context
  st/nine: Back ds to nine_context
  st/nine: Back ps to nine_context
  st/nine: Back User Clip Planes to nine_context
  st/nine: Track dirty state groups in nine_context
  st/nine: Use atomics for nine_bind
  st/nine: Move query9 pipe calls to nine_context
  st/nine: Remove NineDevice9_GetCSO
  st/nine: Access pipe_context via NineDevice9_GetPipe
  st/nine: Rename cso in nine_context to cso_shader
  st/nine: Rename pipe to pipe_data in nine_context
  st/nine: Move pipe and cso to nine_context
  st/nine: Integrate nine_pipe_context_clear to nine_context_clear
  st/nine: Move Managed Pool handling out of nine_context
  st/nine: Do not use NineBaseTexture9 in nine_context
  st/nine: Decompose nine_context_set_stream_source
  st/nine: Decompose nine_context_set_indices
  st/nine: Decompose nine_context_set_texture
  st/nine: Reimplement nine_context_apply_stateblock
  st/nine: Change the way nine_shader gets the pipe
  st/nine: Back swvp in nine_context
  st/nine: Create pipe_surfaces on resource creation.
  st/nine: Simplify the logic to bind textures
  st/nine: Track bindings for buffers
  st/nine: Upload Managed buffers just before draw call using them
  st/nine: Add nine_context_get_pipe_acquire/release
  st/nine: Add secondary pipe for device
  st/nine: Implement Fast path for dynamic buffers and csmt
  st/nine: use get_pipe_acquire/release when possible
  st/nine: Simplify ColorFill
  st/nine: Optimize ColorFill
  st/nine: Use nine_context_clear_render_target
  st/nine: Avoid flushing the queue for queries GetData
  st/nine: Simplify ARG_BIND_REF
  st/nine: Fix NineUnknown_Detach
  st/nine: Detach buffers in swapchain dtor.
  st/nine: Comment and simplify iunknown
  st/nine: Do not bind the container if forward is false
  st/nine: Implement nine_context_range_upload
  st/nine: Optimize managed buffer upload
  st/nine: Implement nine_context_gen_mipmap
  st/nine: Use nine_context_gen_mipmap in BaseTexture9
  st/nine: Implement nine_context_box_upload
  st/nine: Use nine_context_box_upload for surfaces
  st/nine: Fix leak with cubetexture dtor
  st/nine: Fix leak with volume dtor
  st/nine: Use nine_context_box_upload for volumes
  st/nine: Bind destination for surface/volume uploads
  st/nine: Idem for nine_context_gen_mipmap
  st/nine: Add arguments to context's blit and copy_region
  st/nine: Do not wait for DEFAULT lock for surfaces when we can
  st/nine: Do not wait for DEFAULT lock for volumes when we can
  st/nine: Allow non-zero resource offset for vertex buffers
  st/nine: Implement new buffer upload path

Patrick Rudolph (9):
  st/nine: Add nine_queue
  st/nine: Add struct nine_clipplane
  st/nine: Pass size of memory to nine_state
  st/nine: Implement gallium nine CSMT
  st/nine: Print threadid in debug log
  st/nine: Add NINE_DEBUG=tid to turn threadid on or off
  st/nine: Use nine_context for blit
  st/nine: Use nine_context for resource_copy_region

 src/gallium/auxiliary/os/os_thread.h               |   11 +
 src/gallium/state_trackers/nine/Makefile.sources   |    5 +
 src/gallium/state_trackers/nine/adapter9.h         |    1 +
 src/gallium/state_trackers/nine/basetexture9.c     |   45 +-
 src/gallium/state_trackers/nine/basetexture9.h     |   23 +-
 src/gallium/state_trackers/nine/buffer9.c          |  155 +-
 src/gallium/state_trackers/nine/buffer9.h          |   56 +-
 src/gallium/state_trackers/nine/cubetexture9.c     |    2 +-
 src/gallium/state_trackers/nine/device9.c          | 1001 +++----
 src/gallium/state_trackers/nine/device9.h          |   17 +-
 src/gallium/state_trackers/nine/device9ex.c        |    2 +-
 src/gallium/state_trackers/nine/indexbuffer9.c     |   10 +-
 src/gallium/state_trackers/nine/indexbuffer9.h     |    2 -
 src/gallium/state_trackers/nine/iunknown.c         |    9 +-
 src/gallium/state_trackers/nine/iunknown.h         |   40 +-
 .../state_trackers/nine/nine_buffer_upload.c       |  288 ++
 .../state_trackers/nine/nine_buffer_upload.h       |   59 +
 src/gallium/state_trackers/nine/nine_csmt_helper.h |  427 +++
 src/gallium/state_trackers/nine/nine_debug.c       |   28 +-
 src/gallium/state_trackers/nine/nine_debug.h       |    1 +
 src/gallium/state_trackers/nine/nine_ff.c          |  280 +-
 src/gallium/state_trackers/nine/nine_ff.h          |   18 +-
 src/gallium/state_trackers/nine/nine_pipe.c        |   22 -
 src/gallium/state_trackers/nine/nine_pipe.h        |    2 -
 src/gallium/state_trackers/nine/nine_queue.c       |  275 ++
 src/gallium/state_trackers/nine/nine_queue.h       |   54 +
 src/gallium/state_trackers/nine/nine_shader.c      |    3 +-
 src/gallium/state_trackers/nine/nine_shader.h      |    4 +-
 src/gallium/state_trackers/nine/nine_state.c       | 2914 ++++++++++++++++----
 src/gallium/state_trackers/nine/nine_state.h       |  446 ++-
 src/gallium/state_trackers/nine/pixelshader9.c     |   20 +-
 src/gallium/state_trackers/nine/pixelshader9.h     |   14 +-
 src/gallium/state_trackers/nine/query9.c           |   32 +-
 src/gallium/state_trackers/nine/query9.h           |    1 +
 src/gallium/state_trackers/nine/stateblock9.c      |  201 +-
 src/gallium/state_trackers/nine/surface9.c         |  130 +-
 src/gallium/state_trackers/nine/surface9.h         |    8 +-
 src/gallium/state_trackers/nine/swapchain9.c       |   25 +-
 src/gallium/state_trackers/nine/swapchain9.h       |    2 -
 src/gallium/state_trackers/nine/vertexbuffer9.c    |    4 +-
 src/gallium/state_trackers/nine/vertexbuffer9.h    |    2 +-
 src/gallium/state_trackers/nine/vertexshader9.c    |   28 +-
 src/gallium/state_trackers/nine/vertexshader9.h    |   12 +-
 src/gallium/state_trackers/nine/volume9.c          |   94 +-
 src/gallium/state_trackers/nine/volume9.h          |    2 +-
 src/gallium/state_trackers/nine/volumetexture9.c   |    2 +-
 src/gallium/targets/d3dadapter9/drm.c              |    6 +
 src/mesa/drivers/dri/common/xmlpool/t_options.h    |    5 +
 48 files changed, 5048 insertions(+), 1740 deletions(-)
 create mode 100644 src/gallium/state_trackers/nine/nine_buffer_upload.c
 create mode 100644 src/gallium/state_trackers/nine/nine_buffer_upload.h
 create mode 100644 src/gallium/state_trackers/nine/nine_csmt_helper.h
 create mode 100644 src/gallium/state_trackers/nine/nine_queue.c
 create mode 100644 src/gallium/state_trackers/nine/nine_queue.h


More information about the mesa-dev mailing list