[Mesa-dev] [PATCH 00/13] Threaded Gallium for RadeonSI

Marek Olšák maraeo at gmail.com
Wed May 10 22:45:33 UTC 2017


This series adds an optional module into gallium/util that wraps
around pipe_context and moves execution of all pipe_context calls into
a separate thread.

It puts a lot of new requirements on the driver, especially on thread-
safetiness of pipe_context functions, and even expects different
behavior from pipe_context in some cases, so it may be non-trivial
to enable. All of it is necessary to have a perfectly scalable
threaded execution. (Any new drivers should be built around it from
the beginning)

The performance improvement isn't very high (it's just hiding overhead
of pipe_context only), but I can tell you and I have tested a lot of
apps with this, it really doesn't sync the thread with majority of
apps except for SwapBuffers.

It can do these:
- unsychronized buffer mappings don't sync
- ordinary buffer mappings are promoted to unsynchronized when it's safe
- full buffer invalidations are implemented as reallocations and don't sync
- partial buffer invalidations are implemented as copy_buffer and don't sync
- get_query_result doesn't sync when the threaded context has seen flush()
  (i.e. get_query_result is contextless in that case)

- deferred fences - mainly Bioshock Infinite might benefit
- texture mappings (meaning CPU access) always sync, texture_subdata
  doesn't sync for small uploads only, but we can make all texture
  uploads asynchronous by simply copying what is done for buffers

Note that it has a very low overhead when it's always synchronous
(i.e. not multithreaded), because it's really fast to enqueue and
execute calls. The worst case scenario might be -3% performance (just
guessing here).

All requirements on Gallium drivers and other information can be found
in the header file:

RadeonSI enables threaded Gallium by default for OpenGL Core and
Compatibility profiles and all OpenGL ES variants.

There is a small performance concern for RadeonSI: If non-contiguous
VRAM mappings are not supported (amdgpu - kernel 4.11 and older,
radeon - all kernels), the performance difference might be negative,
because buffer invalidations are done unconditionally, meaning that
there can be more live and mapped VRAM buffers. It's difficult to tell
whether any real apps are affected in a measurable way.

Here are performance numbers:

Alien Isolation: +16%
Bioshock Infinite: +13%
Borderlands 2: +12%
Civilization 5: +12%
Civilization 6: +10%
CS:GO: +8%
ET Legacy: +12%
Openarena: +27%
Talos Principle (high details, 1680x1050 internal resolution): +17%
glmark2: no change in the final score

When games are GPU-bound: no change

Because of not taking advantage of deferred fences, Bioshock runs
80% of time asynchronously and 20% of time synchronously.
All other games run 100% of time asynchronously.

x11perf: Test: 500px PutImage Square: -3%
x11perf: Test: Scrolling 500 x 500 px: +16%
x11perf: Test: Char in 80-char aa line: +13%
x11perf: Test: PutImage XY 500x500 Square: +1%
x11perf: Test: Fill 300 x 300px AA Trapezoid: NO CHANGE
x11perf: Test: 500px Copy From Window To Window: +14%
x11perf: Test: Copy 500x500 From Pixmap To Pixmap: -1%
x11perf: Test: 500px Compositing From Pixmap To Window: +21%
x11perf: Test: 500px Compositing From Window To Window: +18%

gtkperf: GTK Widget: Total Time: -2%
gtkperf: GTK Widget: GtkComboBox: +7%
gtkperf: GTK Widget: GtkCheckButton: -15%
gtkperf: GTK Widget: GtkRadioButton: -13%
gtkperf: GTK Widget: GtkToggleButton: -2%
gtkperf: GTK Widget: GtkComboBoxEntry: -1%
gtkperf: GTK Widget: GtkTextView - Scroll: NO CHANGE
gtkperf: GTK Widget: GtkTextView - Add Text: NO CHANGE
gtkperf: GTK Widget: GtkDrawingArea - Circles: -9%
gtkperf: GTK Widget: GtkDrawingArea - Pixbufs: -3%

Hence the decision to enable it by default.

Please review.


More information about the mesa-dev mailing list