[Mesa-dev] [PATCH] st/mesa: don't do L3 thread pinning for Blender
Edmondo Tommasina
edmondo.tommasina at gmail.com
Tue Nov 6 21:50:18 UTC 2018
Hi Marek
It would be nice to have the driconf part of this patch committed in
master to make it easy to test with and without the L3 pinning, so this
patch is:
Reviewed-by: Edmondo Tommasina <edmondo.tommasina at gmail.com>
Now with this patch in place I'm starting to collect some numbers with
and without the CCX affinity on my setup:
CPU: AMD Ryzen 5 2600 Six-Core Processor
GFX: AMD Radeon (TM) RX 470 Graphics (POLARIS10, DRM 3.27.0, 4.19.0-rc4,
LLVM 8.0.0)
RAM: G.Skill Flare X 3200 CL14
drawoverhead
------------
As expected great numbers with drawoverhead. For example:
With L3 thread pinning:
29: DrawElements ( 1 VBO, 8 UBO, 8 Tex) w/ sample mask enable change:
6.91 million (99.5%)
Without:
29: DrawElements ( 1 VBO, 8 UBO, 8 Tex) w/ sample mask enable change:
5.55 million (89.0%)
Hitman Benchmark
----------------
Here we have a performance loss.
With L3 thread pinning:
5765 frames
50.21fps Average
10.16fps Min
137.31fps Max
19.92ms Average
7.28ms Min
98.42ms Max
Without L3 thread pinning:
6024 frames
52.45fps Average
10.28fps Min
129.85fps Max
19.07ms Average
7.70ms Min
97.24ms Max
With thread pinning I lose about 2 FPS on average.
Looking at the CPU load of Hitman Benchmark:
With thread pinnig we see as expected the first 3 cores (SMT active) working
and the cores on the other CCX doing nothing:
09:46:50 PM CPU %usr %nice %sys %iowait %irq %soft %steal
%guest %gnice %idle
09:46:53 PM all 33.43 0.00 1.85 0.00 0.00 0.03 0.00
0.00 0.00 64.70
09:46:53 PM 0 68.79 0.00 4.03 0.00 0.00 0.00 0.00
0.00 0.00 27.18
09:46:53 PM 1 64.63 0.00 3.40 0.00 0.00 0.00 0.00
0.00 0.00 31.97
09:46:53 PM 2 68.46 0.00 3.69 0.00 0.00 0.00 0.00
0.00 0.00 27.85
09:46:53 PM 3 66.67 0.00 2.69 0.00 0.00 0.00 0.00
0.00 0.00 30.64
09:46:53 PM 4 66.89 0.00 3.04 0.00 0.00 0.00 0.00
0.00 0.00 30.07
09:46:53 PM 5 64.07 0.00 3.73 0.00 0.00 0.00 0.00
0.00 0.00 32.20
09:46:53 PM 6 0.67 0.00 0.34 0.00 0.00 0.00 0.00
0.00 0.00 98.99
09:46:53 PM 7 0.66 0.00 0.00 0.00 0.00 0.33 0.00
0.00 0.00 99.01
09:46:53 PM 8 0.33 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 99.67
09:46:53 PM 9 1.33 0.00 1.00 0.00 0.00 0.00 0.00
0.00 0.00 97.67
09:46:53 PM 10 0.33 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 99.67
09:46:53 PM 11 0.33 0.00 0.33 0.00 0.00 0.00 0.00
0.00 0.00 99.33
Without pinning all cores are working:
09:32:07 PM CPU %usr %nice %sys %iowait %irq %soft %steal
%guest %gnice %idle
09:32:10 PM all 42.77 0.00 3.48 0.03 0.00 0.03 0.00
0.00 0.00 53.70
09:32:10 PM 0 48.14 0.00 4.41 0.00 0.00 0.34 0.00
0.00 0.00 47.12
09:32:10 PM 1 37.71 0.00 3.37 0.00 0.00 0.00 0.00
0.00 0.00 58.92
09:32:10 PM 2 42.81 0.00 3.77 0.00 0.00 0.00 0.00
0.00 0.00 53.42
09:32:10 PM 3 44.63 0.00 3.02 0.00 0.00 0.00 0.00
0.00 0.00 52.35
09:32:10 PM 4 44.44 0.00 2.69 0.00 0.00 0.00 0.00
0.00 0.00 52.86
09:32:10 PM 5 43.48 0.00 3.34 0.00 0.00 0.00 0.00
0.00 0.00 53.18
09:32:10 PM 6 45.30 0.00 3.69 0.00 0.00 0.00 0.00
0.00 0.00 51.01
09:32:10 PM 7 46.31 0.00 3.02 0.00 0.00 0.00 0.00
0.00 0.00 50.67
09:32:10 PM 8 38.46 0.00 4.35 0.00 0.00 0.00 0.00
0.00 0.00 57.19
09:32:10 PM 9 35.35 0.00 4.04 0.34 0.00 0.00 0.00
0.00 0.00 60.27
09:32:10 PM 10 43.81 0.00 3.34 0.00 0.00 0.00 0.00
0.00 0.00 52.84
09:32:10 PM 11 42.81 0.00 2.68 0.00 0.00 0.00 0.00
0.00 0.00 54.52
So it could be if an application takes advantage of many cores, the
L3 thread pinning could negate the internal mesa benefits on my CPU.
I'll try to collect more numbers, but right now I have the feeling, it
would be good to commit the driconf option and make the test with and
without thread pinning easier with different games and setups.
Regards
edmondo
On Tue, Oct 30, 2018 at 11:39 PM Marek Olšák <maraeo at gmail.com> wrote:
> From: Marek Olšák <marek.olsak at amd.com>
>
> so that all Blender threads are not forced to be on 1 CCX.
>
> Fixes: 8d473f555a0
> ---
> src/gallium/auxiliary/pipe-loader/driinfo_gallium.h | 1 +
> src/gallium/include/state_tracker/st_api.h | 1 +
> src/gallium/state_trackers/dri/dri_screen.c | 2 ++
> src/mesa/state_tracker/st_context.c | 1 +
> src/mesa/state_tracker/st_context.h | 1 +
> src/mesa/state_tracker/st_manager.c | 8 +++++---
> src/util/00-mesa-defaults.conf | 4 ++++
> src/util/xmlpool/t_options.h | 5 +++++
> 8 files changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/auxiliary/pipe-loader/driinfo_gallium.h
> b/src/gallium/auxiliary/pipe-loader/driinfo_gallium.h
> index 9db0dc01117..daa7ce7f6cc 100644
> --- a/src/gallium/auxiliary/pipe-loader/driinfo_gallium.h
> +++ b/src/gallium/auxiliary/pipe-loader/driinfo_gallium.h
> @@ -24,17 +24,18 @@ DRI_CONF_SECTION_DEBUG
> DRI_CONF_ALLOW_GLSL_EXTENSION_DIRECTIVE_MIDSHADER("false")
> DRI_CONF_ALLOW_GLSL_BUILTIN_CONST_EXPRESSION("false")
> DRI_CONF_ALLOW_GLSL_RELAXED_ES("false")
> DRI_CONF_ALLOW_GLSL_BUILTIN_VARIABLE_REDECLARATION("false")
> DRI_CONF_ALLOW_GLSL_CROSS_STAGE_INTERPOLATION_MISMATCH("false")
> DRI_CONF_ALLOW_HIGHER_COMPAT_VERSION("false")
> DRI_CONF_FORCE_GLSL_ABS_SQRT("false")
> DRI_CONF_GLSL_CORRECT_DERIVATIVES_AFTER_DISCARD("false")
> DRI_CONF_ALLOW_GLSL_LAYOUT_QUALIFIER_ON_FUNCTION_PARAMETERS("false")
> DRI_CONF_FORCE_COMPAT_PROFILE("false")
> + DRI_CONF_DISABLE_L3_THREAD_PINNING("false")
> DRI_CONF_SECTION_END
>
> DRI_CONF_SECTION_MISCELLANEOUS
> DRI_CONF_ALWAYS_HAVE_DEPTH_BUFFER("false")
> DRI_CONF_GLSL_ZERO_INIT("false")
> DRI_CONF_ALLOW_RGB10_CONFIGS("true")
> DRI_CONF_SECTION_END
> diff --git a/src/gallium/include/state_tracker/st_api.h
> b/src/gallium/include/state_tracker/st_api.h
> index 2b63b8a3d2a..26b52f8dc51 100644
> --- a/src/gallium/include/state_tracker/st_api.h
> +++ b/src/gallium/include/state_tracker/st_api.h
> @@ -224,20 +224,21 @@ struct st_config_options
> unsigned force_glsl_version;
> boolean allow_glsl_extension_directive_midshader;
> boolean allow_glsl_builtin_const_expression;
> boolean allow_glsl_relaxed_es;
> boolean allow_glsl_builtin_variable_redeclaration;
> boolean allow_higher_compat_version;
> boolean glsl_zero_init;
> boolean force_glsl_abs_sqrt;
> boolean allow_glsl_cross_stage_interpolation_mismatch;
> boolean allow_glsl_layout_qualifier_on_function_parameters;
> + boolean disable_L3_thread_pinning;
> unsigned char config_options_sha1[20];
> };
>
> /**
> * Represent the attributes of a context.
> */
> struct st_context_attribs
> {
> /**
> * The profile and minimal version to support.
> diff --git a/src/gallium/state_trackers/dri/dri_screen.c
> b/src/gallium/state_trackers/dri/dri_screen.c
> index 82a0988a634..b8bd92475cb 100644
> --- a/src/gallium/state_trackers/dri/dri_screen.c
> +++ b/src/gallium/state_trackers/dri/dri_screen.c
> @@ -80,20 +80,22 @@ dri_fill_st_options(struct dri_screen *screen)
> driQueryOptionb(optionCache,
> "allow_glsl_builtin_variable_redeclaration");
> options->allow_higher_compat_version =
> driQueryOptionb(optionCache, "allow_higher_compat_version");
> options->glsl_zero_init = driQueryOptionb(optionCache,
> "glsl_zero_init");
> options->force_glsl_abs_sqrt =
> driQueryOptionb(optionCache, "force_glsl_abs_sqrt");
> options->allow_glsl_cross_stage_interpolation_mismatch =
> driQueryOptionb(optionCache,
> "allow_glsl_cross_stage_interpolation_mismatch");
> options->allow_glsl_layout_qualifier_on_function_parameters =
> driQueryOptionb(optionCache,
> "allow_glsl_layout_qualifier_on_function_parameters");
> + options->disable_L3_thread_pinning =
> + driQueryOptionb(optionCache, "disable_L3_thread_pinning");
>
> driComputeOptionsSha1(optionCache, options->config_options_sha1);
> }
>
> static unsigned
> dri_loader_get_cap(struct dri_screen *screen, enum dri_loader_cap cap)
> {
> const __DRIdri2LoaderExtension *dri2_loader =
> screen->sPriv->dri2.loader;
> const __DRIimageLoaderExtension *image_loader =
> screen->sPriv->image.loader;
>
> diff --git a/src/mesa/state_tracker/st_context.c
> b/src/mesa/state_tracker/st_context.c
> index 354876746f4..4b19b140bcd 100644
> --- a/src/mesa/state_tracker/st_context.c
> +++ b/src/mesa/state_tracker/st_context.c
> @@ -460,20 +460,21 @@ st_create_context_priv(struct gl_context *ctx,
> struct pipe_context *pipe,
> screen->get_param(screen, PIPE_CAP_QUERY_TIME_ELAPSED);
> st->has_half_float_packing =
> screen->get_param(screen, PIPE_CAP_TGSI_PACK_HALF_FLOAT);
> st->has_multi_draw_indirect =
> screen->get_param(screen, PIPE_CAP_MULTI_DRAW_INDIRECT);
>
> st->has_hw_atomics =
> screen->get_shader_param(screen, PIPE_SHADER_FRAGMENT,
> PIPE_SHADER_CAP_MAX_HW_ATOMIC_COUNTERS)
> ? true : false;
> + st->disable_L3_thread_pinning = options->disable_L3_thread_pinning;
>
> util_throttle_init(&st->throttle,
> screen->get_param(screen,
>
> PIPE_CAP_MAX_TEXTURE_UPLOAD_MEMORY_BUDGET));
>
> /* GL limits and extensions */
> st_init_limits(pipe->screen, &ctx->Const, &ctx->Extensions, ctx->API);
> st_init_extensions(pipe->screen, &ctx->Const,
> &ctx->Extensions, &st->options, ctx->API);
>
> diff --git a/src/mesa/state_tracker/st_context.h
> b/src/mesa/state_tracker/st_context.h
> index 14b9b018809..e57873dafe8 100644
> --- a/src/mesa/state_tracker/st_context.h
> +++ b/src/mesa/state_tracker/st_context.h
> @@ -121,20 +121,21 @@ struct st_context
> boolean has_shader_model3;
> boolean has_etc1;
> boolean has_etc2;
> boolean has_astc_2d_ldr;
> boolean prefer_blit_based_texture_transfer;
> boolean force_persample_in_shader;
> boolean has_shareable_shaders;
> boolean has_half_float_packing;
> boolean has_multi_draw_indirect;
> boolean can_bind_const_buffer_as_vertex;
> + boolean disable_L3_thread_pinning;
>
> /**
> * If a shader can be created when we get its source.
> * This means it has only 1 variant, not counting glBitmap and
> * glDrawPixels.
> */
> boolean shader_has_one_variant[MESA_SHADER_STAGES];
>
> boolean needs_texcoord_semantic;
> boolean apply_texture_swizzle_to_border_color;
> diff --git a/src/mesa/state_tracker/st_manager.c
> b/src/mesa/state_tracker/st_manager.c
> index ceb48dd4903..eb0b88ef473 100644
> --- a/src/mesa/state_tracker/st_manager.c
> +++ b/src/mesa/state_tracker/st_manager.c
> @@ -1067,24 +1067,26 @@ st_api_make_current(struct st_api *stapi, struct
> st_context_iface *stctxi,
>
> /* Purge the context's winsys_buffers list in case any
> * of the referenced drawables no longer exist.
> */
> st_framebuffers_purge(st);
>
> /* Notify the driver that the context thread may have been changed.
> * This should pin all driver threads to a specific L3 cache for
> optimal
> * performance on AMD Zen CPUs.
> */
> - struct glthread_state *glthread = st->ctx->GLThread;
> - thrd_t *upper_thread = glthread ? &glthread->queue.threads[0] :
> NULL;
> + if (!st->disable_L3_thread_pinning) {
> + struct glthread_state *glthread = st->ctx->GLThread;
> + thrd_t *upper_thread = glthread ? &glthread->queue.threads[0] :
> NULL;
>
> - util_context_thread_changed(st->pipe, upper_thread);
> + util_context_thread_changed(st->pipe, upper_thread);
> + }
> }
> else {
> ret = _mesa_make_current(NULL, NULL, NULL);
> }
>
> return ret;
> }
>
>
> static void
> diff --git a/src/util/00-mesa-defaults.conf
> b/src/util/00-mesa-defaults.conf
> index a937c46d052..e9a6b817d9a 100644
> --- a/src/util/00-mesa-defaults.conf
> +++ b/src/util/00-mesa-defaults.conf
> @@ -199,20 +199,24 @@ TODO: document the other workarounds.
> </application>
>
> <application name="Wolfenstein The Old Blood"
> executable="WolfOldBlood_x64.exe">
> <option name="force_compat_profile" value="true" />
> </application>
>
> <application name="ARMA 3" executable="arma3.x86_64">
> <option name="glsl_correct_derivatives_after_discard"
> value="true"/>
> </application>
>
> + <application name="Blender" executable="blender">
> + <option name="disable_L3_thread_pinning" value="true"/>
> + </application>
> +
> <!-- The GL thread whitelist is below, workarounds are above.
> Keep it that way. -->
>
> <application name="Alien Isolation" executable="AlienIsolation">
> <option name="mesa_glthread" value="true"/>
> </application>
>
> <application name="BioShock Infinite" executable="bioshock.i386">
> <option name="mesa_glthread" value="true"/>
> </application>
> diff --git a/src/util/xmlpool/t_options.h b/src/util/xmlpool/t_options.h
> index e0a30f5fd1d..5d916519794 100644
> --- a/src/util/xmlpool/t_options.h
> +++ b/src/util/xmlpool/t_options.h
> @@ -138,20 +138,25 @@ DRI_CONF_OPT_END
> #define DRI_CONF_ALLOW_GLSL_LAYOUT_QUALIFIER_ON_FUNCTION_PARAMETERS(def) \
> DRI_CONF_OPT_BEGIN_B(allow_glsl_layout_qualifier_on_function_parameters,
> def) \
> DRI_CONF_DESC(en,gettext("Allow layout qualifiers on function
> parameters.")) \
> DRI_CONF_OPT_END
>
> #define DRI_CONF_FORCE_COMPAT_PROFILE(def) \
> DRI_CONF_OPT_BEGIN_B(force_compat_profile, def) \
> DRI_CONF_DESC(en,gettext("Force an OpenGL compatibility
> context")) \
> DRI_CONF_OPT_END
>
> +#define DRI_CONF_DISABLE_L3_THREAD_PINNING(def) \
> +DRI_CONF_OPT_BEGIN_B(disable_L3_thread_pinning, def) \
> + DRI_CONF_DESC(en,gettext("Disable L3 thread pinning.")) \
> +DRI_CONF_OPT_END
> +
> /**
> * \brief Image quality-related options
> */
> #define DRI_CONF_SECTION_QUALITY \
> DRI_CONF_SECTION_BEGIN \
> DRI_CONF_DESC(en,gettext("Image Quality"))
>
> #define DRI_CONF_PRECISE_TRIG(def) \
> DRI_CONF_OPT_BEGIN_B(precise_trig, def) \
> DRI_CONF_DESC(en,gettext("Prefer accuracy over performance in
> trig functions")) \
> --
> 2.17.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20181106/bdbca051/attachment-0001.html>
More information about the mesa-dev
mailing list