[Mesa-dev] [PATCH 00/84] Introduce gallium nine internal multithreading
axel.davy at ens.fr
Wed Dec 7 22:54:33 UTC 2016
This patch adds internal multithreading to gallium nine.
The goal is to offload almost all gallium nine calls (and some other
work) to a worker thread.
The patch serie does first a lot of refactoring, and introduces a new
nine_context structure containing all the required internal states
to do the gallium calls.
It will be the structure used exclusively by the worker thread.
The pipe_context is exclusive to the worker thread, and the main
thread needs special functions to access it, which either wait
for all pending commands to execute, or pause the thread.
A secondary pipe_context is also introduced for operations
that don't need implicit synchronization with rendering (buffer
upload with DISCARD/NOOVERWRITE in particular).
To maximize performance, the commands are queued into preallocated queues,
and the queues are made visible to the worker thread only when a significant
amount of commands are queued. To be a performance gain, this requires waiting
on the worker thread to finish its job to be very rare.
With all the patches of the serie, synchronization basically happens only at:
. surface/volume destruction if their content is in RAM and was needed for a pending command.
. Buffer lock not using MANAGED pool or DISCARD/NOOVERWRITE
. Surface/volume lock very close to a previous lock.
^^^ Usually these cases only happen at the beginning of scene, where items are initialized, and
do not happen in a normal frame.
. At the end of a frame, when Present() is called.
Thus basically for the great majority of games, the only moment we really wait for the
worker thread to finish its job is when all frame commands have been sent.
Because we require driver thread safety (pipe_screen commands can be made in the main
thread while the pipe_context is used in the worker thread), internal multithreading
(dubbed CSMT, in reference to wine ogl internal multithreading mode) is enabled by
default only on r600/radeonsi, but can be forced on/off via a setting (csmt_force=0 or 1).
One thing the patchset could improve is stateblocks handling.
The function to apply them to the nine_context can be optimized,
and overhead can be reduced. Most games don't use stateblocks though.
How does this compare to wine CSMT ?
I haven't looked much at the details of wine CSMT.
My understanding is that opengl calls are offloaded to
a worker thread. I don't know exactly which optimizations
are done to avoid waiting on the worker thread.
How does this compare to Windows internal multithreading ?
The public direct3D DDI documentation gives some indications
on how works the multithreading. Some tests can also be made to
deduce some information. Basically most commands are said to be
put into a worker thread, while buffer locks and check for query results
are made with reentrant functions in the main thread.
Some tests suggest MANAGED pool upload is done in the worker thread,
like the gallium patch serie implements.
Thus we expect performance to be comparable.
Axel Davy (75):
st/nine: Introduce nine_context
st/nine: Move core of device clear to nine_state
st/nine: Move draw calls to nine_state
st/nine: Track changed.texture only for stateblocks
st/nine: Move texture setting to nine_context_*
st/nine: Back textures into nine_context
st/nine: Move stream_usage_mask to nine_context
st/nine: Move vtxbuf to nine_context
st/nine: Move stream freq data to nine_context
st/nine: Back vdecl to nine_context
st/nine: Back vs to nine_context
st/nine: Back sampler states to nine_context
st/nine: Back all shader constants to nine_context
st/nine: Back current index buffer to nine_context
st/nine: Back RT to nine_context
st/nine: Back scissor to nine_context
st/nine: Back viewport to nine_context
st/nine: Put ff data in a separate structure
st/nine: Refactor SetLight
st/nine: Refactor LightEnable
st/nine: Back all ff states in nine_context
st/nine: Back ds to nine_context
st/nine: Back ps to nine_context
st/nine: Back User Clip Planes to nine_context
st/nine: Track dirty state groups in nine_context
st/nine: Use atomics for nine_bind
st/nine: Move query9 pipe calls to nine_context
st/nine: Remove NineDevice9_GetCSO
st/nine: Access pipe_context via NineDevice9_GetPipe
st/nine: Rename cso in nine_context to cso_shader
st/nine: Rename pipe to pipe_data in nine_context
st/nine: Move pipe and cso to nine_context
st/nine: Integrate nine_pipe_context_clear to nine_context_clear
st/nine: Move Managed Pool handling out of nine_context
st/nine: Do not use NineBaseTexture9 in nine_context
st/nine: Decompose nine_context_set_stream_source
st/nine: Decompose nine_context_set_indices
st/nine: Decompose nine_context_set_texture
st/nine: Reimplement nine_context_apply_stateblock
st/nine: Change the way nine_shader gets the pipe
st/nine: Back swvp in nine_context
st/nine: Create pipe_surfaces on resource creation.
st/nine: Simplify the logic to bind textures
st/nine: Fix BASETEX_REGISTER_UPDATE
st/nine: Track bindings for buffers
st/nine: Upload Managed buffers just before draw call using them
st/nine: Add nine_context_get_pipe_acquire/release
st/nine: Add secondary pipe for device
st/nine: Implement Fast path for dynamic buffers and csmt
st/nine: use get_pipe_acquire/release when possible
st/nine: Simplify ColorFill
st/nine: Optimize ColorFill
st/nine: Use nine_context_clear_render_target
st/nine: Avoid flushing the queue for queries GetData
st/nine: Simplify ARG_BIND_REF
st/nine: Fix NineUnknown_Detach
st/nine: Detach buffers in swapchain dtor.
st/nine: Comment and simplify iunknown
st/nine: Do not bind the container if forward is false
st/nine: Implement nine_context_range_upload
st/nine: Optimize managed buffer upload
st/nine: Implement nine_context_gen_mipmap
st/nine: Use nine_context_gen_mipmap in BaseTexture9
st/nine: Implement nine_context_box_upload
st/nine: Use nine_context_box_upload for surfaces
st/nine: Fix leak with cubetexture dtor
st/nine: Fix leak with volume dtor
st/nine: Use nine_context_box_upload for volumes
st/nine: Bind destination for surface/volume uploads
st/nine: Idem for nine_context_gen_mipmap
st/nine: Add arguments to context's blit and copy_region
st/nine: Do not wait for DEFAULT lock for surfaces when we can
st/nine: Do not wait for DEFAULT lock for volumes when we can
st/nine: Allow non-zero resource offset for vertex buffers
st/nine: Implement new buffer upload path
Patrick Rudolph (9):
st/nine: Add nine_queue
st/nine: Add struct nine_clipplane
st/nine: Pass size of memory to nine_state
st/nine: Implement gallium nine CSMT
st/nine: Print threadid in debug log
st/nine: Add NINE_DEBUG=tid to turn threadid on or off
st/nine: Use nine_context for blit
st/nine: Use nine_context for resource_copy_region
st/nine: Add CSMT_NO_WAIT_WITH_COUNTER
src/gallium/auxiliary/os/os_thread.h | 11 +
src/gallium/state_trackers/nine/Makefile.sources | 5 +
src/gallium/state_trackers/nine/adapter9.h | 1 +
src/gallium/state_trackers/nine/basetexture9.c | 45 +-
src/gallium/state_trackers/nine/basetexture9.h | 23 +-
src/gallium/state_trackers/nine/buffer9.c | 155 +-
src/gallium/state_trackers/nine/buffer9.h | 56 +-
src/gallium/state_trackers/nine/cubetexture9.c | 2 +-
src/gallium/state_trackers/nine/device9.c | 1001 +++----
src/gallium/state_trackers/nine/device9.h | 17 +-
src/gallium/state_trackers/nine/device9ex.c | 2 +-
src/gallium/state_trackers/nine/indexbuffer9.c | 10 +-
src/gallium/state_trackers/nine/indexbuffer9.h | 2 -
src/gallium/state_trackers/nine/iunknown.c | 9 +-
src/gallium/state_trackers/nine/iunknown.h | 40 +-
.../state_trackers/nine/nine_buffer_upload.c | 288 ++
.../state_trackers/nine/nine_buffer_upload.h | 59 +
src/gallium/state_trackers/nine/nine_csmt_helper.h | 427 +++
src/gallium/state_trackers/nine/nine_debug.c | 28 +-
src/gallium/state_trackers/nine/nine_debug.h | 1 +
src/gallium/state_trackers/nine/nine_ff.c | 280 +-
src/gallium/state_trackers/nine/nine_ff.h | 18 +-
src/gallium/state_trackers/nine/nine_pipe.c | 22 -
src/gallium/state_trackers/nine/nine_pipe.h | 2 -
src/gallium/state_trackers/nine/nine_queue.c | 275 ++
src/gallium/state_trackers/nine/nine_queue.h | 54 +
src/gallium/state_trackers/nine/nine_shader.c | 3 +-
src/gallium/state_trackers/nine/nine_shader.h | 4 +-
src/gallium/state_trackers/nine/nine_state.c | 2914 ++++++++++++++++----
src/gallium/state_trackers/nine/nine_state.h | 446 ++-
src/gallium/state_trackers/nine/pixelshader9.c | 20 +-
src/gallium/state_trackers/nine/pixelshader9.h | 14 +-
src/gallium/state_trackers/nine/query9.c | 32 +-
src/gallium/state_trackers/nine/query9.h | 1 +
src/gallium/state_trackers/nine/stateblock9.c | 201 +-
src/gallium/state_trackers/nine/surface9.c | 130 +-
src/gallium/state_trackers/nine/surface9.h | 8 +-
src/gallium/state_trackers/nine/swapchain9.c | 25 +-
src/gallium/state_trackers/nine/swapchain9.h | 2 -
src/gallium/state_trackers/nine/vertexbuffer9.c | 4 +-
src/gallium/state_trackers/nine/vertexbuffer9.h | 2 +-
src/gallium/state_trackers/nine/vertexshader9.c | 28 +-
src/gallium/state_trackers/nine/vertexshader9.h | 12 +-
src/gallium/state_trackers/nine/volume9.c | 94 +-
src/gallium/state_trackers/nine/volume9.h | 2 +-
src/gallium/state_trackers/nine/volumetexture9.c | 2 +-
src/gallium/targets/d3dadapter9/drm.c | 6 +
src/mesa/drivers/dri/common/xmlpool/t_options.h | 5 +
48 files changed, 5048 insertions(+), 1740 deletions(-)
create mode 100644 src/gallium/state_trackers/nine/nine_buffer_upload.c
create mode 100644 src/gallium/state_trackers/nine/nine_buffer_upload.h
create mode 100644 src/gallium/state_trackers/nine/nine_csmt_helper.h
create mode 100644 src/gallium/state_trackers/nine/nine_queue.c
create mode 100644 src/gallium/state_trackers/nine/nine_queue.h
More information about the mesa-dev