[Mesa-dev] [PATCH 00/13] Threaded Gallium for RadeonSI

Marek Olšák maraeo at gmail.com
Thu May 11 09:43:23 UTC 2017


On May 11, 2017 2:30 AM, "Timothy Arceri" <tarceri at itsqueeze.com> wrote:

On 11/05/17 08:45, Marek Olšák wrote:

> Hi,
>
> This series adds an optional module into gallium/util that wraps
> around pipe_context and moves execution of all pipe_context calls into
> a separate thread.
>
> It puts a lot of new requirements on the driver, especially on thread-
> safetiness of pipe_context functions, and even expects different
> behavior from pipe_context in some cases, so it may be non-trivial
> to enable. All of it is necessary to have a perfectly scalable
> threaded execution. (Any new drivers should be built around it from
> the beginning)
>
> The performance improvement isn't very high (it's just hiding overhead
> of pipe_context only), but I can tell you and I have tested a lot of
> apps with this, it really doesn't sync the thread with majority of
> apps except for SwapBuffers.
>
> It can do these:
> - unsychronized buffer mappings don't sync
> - ordinary buffer mappings are promoted to unsynchronized when it's safe
> - full buffer invalidations are implemented as reallocations and don't sync
> - partial buffer invalidations are implemented as copy_buffer and don't
> sync
> - get_query_result doesn't sync when the threaded context has seen flush()
>    (i.e. get_query_result is contextless in that case)
>
> Missing:
> - deferred fences - mainly Bioshock Infinite might benefit
> - texture mappings (meaning CPU access) always sync, texture_subdata
>    doesn't sync for small uploads only, but we can make all texture
>    uploads asynchronous by simply copying what is done for buffers
>
> Note that it has a very low overhead when it's always synchronous
> (i.e. not multithreaded), because it's really fast to enqueue and
> execute calls. The worst case scenario might be -3% performance (just
> guessing here).
>
> All requirements on Gallium drivers and other information can be found
> in the header file:
> https://cgit.freedesktop.org/~mareko/mesa/tree/src/gallium/a
> uxiliary/util/u_threaded_context.h?h=gallium-threaded2#n26
>
> RadeonSI enables threaded Gallium by default for OpenGL Core and
> Compatibility profiles and all OpenGL ES variants.
>
> There is a small performance concern for RadeonSI: If non-contiguous
> VRAM mappings are not supported (amdgpu - kernel 4.11 and older,
> radeon - all kernels), the performance difference might be negative,
> because buffer invalidations are done unconditionally, meaning that
> there can be more live and mapped VRAM buffers. It's difficult to tell
> whether any real apps are affected in a measurable way.
>
> Here are performance numbers:
>
> APPS: MORE IS BETTER
> Alien Isolation: +16%
> Bioshock Infinite: +13%
> Borderlands 2: +12%
> Civilization 5: +12%
> Civilization 6: +10%
> CS:GO: +8%
> ET Legacy: +12%
> Openarena: +27%
> Talos Principle (high details, 1680x1050 internal resolution): +17%
> glmark2: no change in the final score
>
> When games are GPU-bound: no change
>
> Because of not taking advantage of deferred fences, Bioshock runs
> 80% of time asynchronously and 20% of time synchronously.
> All other games run 100% of time asynchronously.
>
> x11perf: MORE IS BETTER
> x11perf: Test: 500px PutImage Square: -3%
> x11perf: Test: Scrolling 500 x 500 px: +16%
> x11perf: Test: Char in 80-char aa line: +13%
> x11perf: Test: PutImage XY 500x500 Square: +1%
> x11perf: Test: Fill 300 x 300px AA Trapezoid: NO CHANGE
> x11perf: Test: 500px Copy From Window To Window: +14%
> x11perf: Test: Copy 500x500 From Pixmap To Pixmap: -1%
> x11perf: Test: 500px Compositing From Pixmap To Window: +21%
> x11perf: Test: 500px Compositing From Window To Window: +18%
>
> gtkperf: LESS IS BETTER
> gtkperf: GTK Widget: Total Time: -2%
> gtkperf: GTK Widget: GtkComboBox: +7%
> gtkperf: GTK Widget: GtkCheckButton: -15%
> gtkperf: GTK Widget: GtkRadioButton: -13%
> gtkperf: GTK Widget: GtkToggleButton: -2%
> gtkperf: GTK Widget: GtkComboBoxEntry: -1%
> gtkperf: GTK Widget: GtkTextView - Scroll: NO CHANGE
> gtkperf: GTK Widget: GtkTextView - Add Text: NO CHANGE
> gtkperf: GTK Widget: GtkDrawingArea - Circles: -9%
> gtkperf: GTK Widget: GtkDrawingArea - Pixbufs: -3%
>
> Hence the decision to enable it by default.
>

Hi Marek,

Are you able to provide details of the system (CPU/GPU) used for testing?
Should we not try to get more results of different combinations before
enabling by default? Or maybe at least add an environment var allow
disabling it so that community members can easily provide results over the
17.2 development process?


Core i5 3570, Radeon Fury.

There is an environment variable to disable it.

The goal was to enable it by default from the beginning. Since it never
syncs the thread with most apps, which can be observed easily via the
exposed counters for HUD, it's a no-brainer.

Marek





> Please review.
>
> Marek
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> _______________________________________________
mesa-dev mailing list
mesa-dev at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170511/e314b5de/attachment.html>


More information about the mesa-dev mailing list