[Mesa-dev] [PATCH 00/13] Threaded Gallium for RadeonSI
Nicolai Hähnle
nhaehnle at gmail.com
Thu May 11 19:02:21 UTC 2017
On 11.05.2017 00:45, Marek Olšák wrote:
> Hi,
>
> This series adds an optional module into gallium/util that wraps
> around pipe_context and moves execution of all pipe_context calls into
> a separate thread.
>
> It puts a lot of new requirements on the driver, especially on thread-
> safetiness of pipe_context functions, and even expects different
> behavior from pipe_context in some cases, so it may be non-trivial
> to enable. All of it is necessary to have a perfectly scalable
> threaded execution. (Any new drivers should be built around it from
> the beginning)
>
> The performance improvement isn't very high (it's just hiding overhead
> of pipe_context only), but I can tell you and I have tested a lot of
> apps with this, it really doesn't sync the thread with majority of
> apps except for SwapBuffers.
>
> It can do these:
> - unsychronized buffer mappings don't sync
> - ordinary buffer mappings are promoted to unsynchronized when it's safe
> - full buffer invalidations are implemented as reallocations and don't sync
> - partial buffer invalidations are implemented as copy_buffer and don't sync
> - get_query_result doesn't sync when the threaded context has seen flush()
> (i.e. get_query_result is contextless in that case)
>
> Missing:
> - deferred fences - mainly Bioshock Infinite might benefit
> - texture mappings (meaning CPU access) always sync, texture_subdata
> doesn't sync for small uploads only, but we can make all texture
> uploads asynchronous by simply copying what is done for buffers
>
> Note that it has a very low overhead when it's always synchronous
> (i.e. not multithreaded), because it's really fast to enqueue and
> execute calls. The worst case scenario might be -3% performance (just
> guessing here).
>
> All requirements on Gallium drivers and other information can be found
> in the header file:
> https://cgit.freedesktop.org/~mareko/mesa/tree/src/gallium/auxiliary/util/u_threaded_context.h?h=gallium-threaded2#n26
>
> RadeonSI enables threaded Gallium by default for OpenGL Core and
> Compatibility profiles and all OpenGL ES variants.
>
> There is a small performance concern for RadeonSI: If non-contiguous
> VRAM mappings are not supported (amdgpu - kernel 4.11 and older,
> radeon - all kernels), the performance difference might be negative,
> because buffer invalidations are done unconditionally, meaning that
> there can be more live and mapped VRAM buffers. It's difficult to tell
> whether any real apps are affected in a measurable way.
>
> Here are performance numbers:
>
> APPS: MORE IS BETTER
> Alien Isolation: +16%
> Bioshock Infinite: +13%
> Borderlands 2: +12%
> Civilization 5: +12%
> Civilization 6: +10%
> CS:GO: +8%
> ET Legacy: +12%
> Openarena: +27%
> Talos Principle (high details, 1680x1050 internal resolution): +17%
> glmark2: no change in the final score
>
> When games are GPU-bound: no change
>
> Because of not taking advantage of deferred fences, Bioshock runs
> 80% of time asynchronously and 20% of time synchronously.
> All other games run 100% of time asynchronously.
>
> x11perf: MORE IS BETTER
> x11perf: Test: 500px PutImage Square: -3%
> x11perf: Test: Scrolling 500 x 500 px: +16%
> x11perf: Test: Char in 80-char aa line: +13%
> x11perf: Test: PutImage XY 500x500 Square: +1%
> x11perf: Test: Fill 300 x 300px AA Trapezoid: NO CHANGE
> x11perf: Test: 500px Copy From Window To Window: +14%
> x11perf: Test: Copy 500x500 From Pixmap To Pixmap: -1%
> x11perf: Test: 500px Compositing From Pixmap To Window: +21%
> x11perf: Test: 500px Compositing From Window To Window: +18%
>
> gtkperf: LESS IS BETTER
> gtkperf: GTK Widget: Total Time: -2%
> gtkperf: GTK Widget: GtkComboBox: +7%
> gtkperf: GTK Widget: GtkCheckButton: -15%
> gtkperf: GTK Widget: GtkRadioButton: -13%
> gtkperf: GTK Widget: GtkToggleButton: -2%
> gtkperf: GTK Widget: GtkComboBoxEntry: -1%
> gtkperf: GTK Widget: GtkTextView - Scroll: NO CHANGE
> gtkperf: GTK Widget: GtkTextView - Add Text: NO CHANGE
> gtkperf: GTK Widget: GtkDrawingArea - Circles: -9%
> gtkperf: GTK Widget: GtkDrawingArea - Pixbufs: -3%
>
> Hence the decision to enable it by default.
Those are some pretty impressive numbers! I sent comments / questions on
patches 3 & 9, the rest are:
Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
Some general remarks:
Violating the "async" promise on debug callbacks is a problem. This
breaks the OpenGL API in a place where it wasn't broken before, and
that's not okay. I'm not sure what to do about this precisely, but the
spec is very explicit:
"When DEBUG_OUTPUT_SYNCHRONOUS is enabled, the driver guarantees
synchronous calls to the callback routine by the context. When
synchronous callbacks are enabled, all calls to the callback
routine will be made by the thread that owns the current context;
all such calls will be made serially by the current context; and
each call will be made before the GL command that generated the
debug message is allowed to return."
The last part is the strictest and implies that sync-ing becomes mandatory.
Maybe this can be handled without a performance impact by swapping out
pipe_context function pointers when the debug callback changes to !async.
I'm also not too happy about ignoring resource_commit errors. Since the
idea of sparse buffers/textures is to potentially allocate lots of
memory, getting out-of-memory notifications there is kind of important.
On the other hand, we handle out-of-memory inconsistently already, and
forcing a sync is too high a price. I think we can live with it for now.
If having more buffers alive due to more invalidations ever becomes a
serious issue, we could consider exposing user fences to the
threaded_context. I see no reasons why that wouldn't work, but of course
it requires some more re-work across the winsys (and wouldn't work with
radeon).
Cheers,
Nicolai
>
> Please review.
>
> Marek
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
More information about the mesa-dev
mailing list