<div dir="ltr"><div dir="ltr"><div>Hi Marek</div><div><br></div><div>It would be nice to have the driconf part of this patch committed in</div><div>master to make it easy to test with and without the L3 pinning, so this</div><div>patch is:</div><div><br></div><div>Reviewed-by: Edmondo Tommasina <<a href="mailto:edmondo.tommasina@gmail.com">edmondo.tommasina@gmail.com</a>></div><div><br></div><div>Now with this patch in place I'm starting to collect some numbers with</div><div>and without the CCX affinity on my setup:</div><div><br></div><div>CPU: AMD Ryzen 5 2600 Six-Core Processor</div><div>GFX: AMD Radeon (TM) RX 470 Graphics (POLARIS10, DRM 3.27.0, 4.19.0-rc4, LLVM 8.0.0)</div><div>RAM: G.Skill Flare X 3200 CL14</div><div><br></div><div>drawoverhead</div><div>------------</div><div>As expected great numbers with drawoverhead. For example:</div><div><br></div><div>With L3 thread pinning:</div><div> 29: DrawElements ( 1 VBO, 8 UBO, 8 Tex) w/ sample mask enable change: 6.91 million (99.5%)</div><div><br></div><div>Without:</div><div> 29: DrawElements ( 1 VBO, 8 UBO, 8 Tex) w/ sample mask enable change: 5.55 million (89.0%)</div><div><br></div><div><br></div><div>Hitman Benchmark</div><div>----------------</div><div>Here we have a performance loss.</div><div><br></div><div>With L3 thread pinning:</div><div><br></div><div>5765 frames</div><div> 50.21fps Average</div><div> 10.16fps Min</div><div>137.31fps Max</div><div> 19.92ms Average</div><div> 7.28ms Min</div><div> 98.42ms Max</div><div><br></div><div><br></div><div>Without L3 thread pinning:</div><div><br></div><div>6024 frames</div><div> 52.45fps Average</div><div> 10.28fps Min</div><div>129.85fps Max</div><div> 19.07ms Average</div><div> 7.70ms Min</div><div> 97.24ms Max</div><div><br></div><div>With thread pinning I lose about 2 FPS on average.</div><div><br></div><div>Looking at the CPU load of Hitman Benchmark:</div><div><br></div><div>With thread pinnig we see as expected the first 3 cores (SMT active) working</div><div>and the cores on the other CCX doing nothing:</div><div><br></div><div>09:46:50 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle</div><div>09:46:53 PM all 33.43 0.00 1.85 0.00 0.00 0.03 0.00 0.00 0.00 64.70</div><div>09:46:53 PM 0 68.79 0.00 4.03 0.00 0.00 0.00 0.00 0.00 0.00 27.18</div><div>09:46:53 PM 1 64.63 0.00 3.40 0.00 0.00 0.00 0.00 0.00 0.00 31.97</div><div>09:46:53 PM 2 68.46 0.00 3.69 0.00 0.00 0.00 0.00 0.00 0.00 27.85</div><div>09:46:53 PM 3 66.67 0.00 2.69 0.00 0.00 0.00 0.00 0.00 0.00 30.64</div><div>09:46:53 PM 4 66.89 0.00 3.04 0.00 0.00 0.00 0.00 0.00 0.00 30.07</div><div>09:46:53 PM 5 64.07 0.00 3.73 0.00 0.00 0.00 0.00 0.00 0.00 32.20</div><div>09:46:53 PM 6 0.67 0.00 0.34 0.00 0.00 0.00 0.00 0.00 0.00 98.99</div><div>09:46:53 PM 7 0.66 0.00 0.00 0.00 0.00 0.33 0.00 0.00 0.00 99.01</div><div>09:46:53 PM 8 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.67</div><div>09:46:53 PM 9 1.33 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 97.67</div><div>09:46:53 PM 10 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.67</div><div>09:46:53 PM 11 0.33 0.00 0.33 0.00 0.00 0.00 0.00 0.00 0.00 99.33</div><div><br></div><div>Without pinning all cores are working:</div><div><br></div><div>09:32:07 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle</div><div>09:32:10 PM all 42.77 0.00 3.48 0.03 0.00 0.03 0.00 0.00 0.00 53.70</div><div>09:32:10 PM 0 48.14 0.00 4.41 0.00 0.00 0.34 0.00 0.00 0.00 47.12</div><div>09:32:10 PM 1 37.71 0.00 3.37 0.00 0.00 0.00 0.00 0.00 0.00 58.92</div><div>09:32:10 PM 2 42.81 0.00 3.77 0.00 0.00 0.00 0.00 0.00 0.00 53.42</div><div>09:32:10 PM 3 44.63 0.00 3.02 0.00 0.00 0.00 0.00 0.00 0.00 52.35</div><div>09:32:10 PM 4 44.44 0.00 2.69 0.00 0.00 0.00 0.00 0.00 0.00 52.86</div><div>09:32:10 PM 5 43.48 0.00 3.34 0.00 0.00 0.00 0.00 0.00 0.00 53.18</div><div>09:32:10 PM 6 45.30 0.00 3.69 0.00 0.00 0.00 0.00 0.00 0.00 51.01</div><div>09:32:10 PM 7 46.31 0.00 3.02 0.00 0.00 0.00 0.00 0.00 0.00 50.67</div><div>09:32:10 PM 8 38.46 0.00 4.35 0.00 0.00 0.00 0.00 0.00 0.00 57.19</div><div>09:32:10 PM 9 35.35 0.00 4.04 0.34 0.00 0.00 0.00 0.00 0.00 60.27</div><div>09:32:10 PM 10 43.81 0.00 3.34 0.00 0.00 0.00 0.00 0.00 0.00 52.84</div><div>09:32:10 PM 11 42.81 0.00 2.68 0.00 0.00 0.00 0.00 0.00 0.00 54.52</div><div><br></div><div>So it could be if an application takes advantage of many cores, the</div><div>L3 thread pinning could negate the internal mesa benefits on my CPU.</div><div><br></div><div>I'll try to collect more numbers, but right now I have the feeling, it</div><div>would be good to commit the driconf option and make the test with and</div><div>without thread pinning easier with different games and setups.</div><div><br></div><div>Regards</div><div>edmondo</div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, Oct 30, 2018 at 11:39 PM Marek Olšák <<a href="mailto:maraeo@gmail.com">maraeo@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">From: Marek Olšák <<a href="mailto:marek.olsak@amd.com" target="_blank">marek.olsak@amd.com</a>><br>
<br>
so that all Blender threads are not forced to be on 1 CCX.<br>
<br>
Fixes: 8d473f555a0<br>
---<br>
src/gallium/auxiliary/pipe-loader/driinfo_gallium.h | 1 +<br>
src/gallium/include/state_tracker/st_api.h | 1 +<br>
src/gallium/state_trackers/dri/dri_screen.c | 2 ++<br>
src/mesa/state_tracker/st_context.c | 1 +<br>
src/mesa/state_tracker/st_context.h | 1 +<br>
src/mesa/state_tracker/st_manager.c | 8 +++++---<br>
src/util/00-mesa-defaults.conf | 4 ++++<br>
src/util/xmlpool/t_options.h | 5 +++++<br>
8 files changed, 20 insertions(+), 3 deletions(-)<br>
<br>
diff --git a/src/gallium/auxiliary/pipe-loader/driinfo_gallium.h b/src/gallium/auxiliary/pipe-loader/driinfo_gallium.h<br>
index 9db0dc01117..daa7ce7f6cc 100644<br>
--- a/src/gallium/auxiliary/pipe-loader/driinfo_gallium.h<br>
+++ b/src/gallium/auxiliary/pipe-loader/driinfo_gallium.h<br>
@@ -24,17 +24,18 @@ DRI_CONF_SECTION_DEBUG<br>
DRI_CONF_ALLOW_GLSL_EXTENSION_DIRECTIVE_MIDSHADER("false")<br>
DRI_CONF_ALLOW_GLSL_BUILTIN_CONST_EXPRESSION("false")<br>
DRI_CONF_ALLOW_GLSL_RELAXED_ES("false")<br>
DRI_CONF_ALLOW_GLSL_BUILTIN_VARIABLE_REDECLARATION("false")<br>
DRI_CONF_ALLOW_GLSL_CROSS_STAGE_INTERPOLATION_MISMATCH("false")<br>
DRI_CONF_ALLOW_HIGHER_COMPAT_VERSION("false")<br>
DRI_CONF_FORCE_GLSL_ABS_SQRT("false")<br>
DRI_CONF_GLSL_CORRECT_DERIVATIVES_AFTER_DISCARD("false")<br>
DRI_CONF_ALLOW_GLSL_LAYOUT_QUALIFIER_ON_FUNCTION_PARAMETERS("false")<br>
DRI_CONF_FORCE_COMPAT_PROFILE("false")<br>
+ DRI_CONF_DISABLE_L3_THREAD_PINNING("false")<br>
DRI_CONF_SECTION_END<br>
<br>
DRI_CONF_SECTION_MISCELLANEOUS<br>
DRI_CONF_ALWAYS_HAVE_DEPTH_BUFFER("false")<br>
DRI_CONF_GLSL_ZERO_INIT("false")<br>
DRI_CONF_ALLOW_RGB10_CONFIGS("true")<br>
DRI_CONF_SECTION_END<br>
diff --git a/src/gallium/include/state_tracker/st_api.h b/src/gallium/include/state_tracker/st_api.h<br>
index 2b63b8a3d2a..26b52f8dc51 100644<br>
--- a/src/gallium/include/state_tracker/st_api.h<br>
+++ b/src/gallium/include/state_tracker/st_api.h<br>
@@ -224,20 +224,21 @@ struct st_config_options<br>
unsigned force_glsl_version;<br>
boolean allow_glsl_extension_directive_midshader;<br>
boolean allow_glsl_builtin_const_expression;<br>
boolean allow_glsl_relaxed_es;<br>
boolean allow_glsl_builtin_variable_redeclaration;<br>
boolean allow_higher_compat_version;<br>
boolean glsl_zero_init;<br>
boolean force_glsl_abs_sqrt;<br>
boolean allow_glsl_cross_stage_interpolation_mismatch;<br>
boolean allow_glsl_layout_qualifier_on_function_parameters;<br>
+ boolean disable_L3_thread_pinning;<br>
unsigned char config_options_sha1[20];<br>
};<br>
<br>
/**<br>
* Represent the attributes of a context.<br>
*/<br>
struct st_context_attribs<br>
{<br>
/**<br>
* The profile and minimal version to support.<br>
diff --git a/src/gallium/state_trackers/dri/dri_screen.c b/src/gallium/state_trackers/dri/dri_screen.c<br>
index 82a0988a634..b8bd92475cb 100644<br>
--- a/src/gallium/state_trackers/dri/dri_screen.c<br>
+++ b/src/gallium/state_trackers/dri/dri_screen.c<br>
@@ -80,20 +80,22 @@ dri_fill_st_options(struct dri_screen *screen)<br>
driQueryOptionb(optionCache, "allow_glsl_builtin_variable_redeclaration");<br>
options->allow_higher_compat_version =<br>
driQueryOptionb(optionCache, "allow_higher_compat_version");<br>
options->glsl_zero_init = driQueryOptionb(optionCache, "glsl_zero_init");<br>
options->force_glsl_abs_sqrt =<br>
driQueryOptionb(optionCache, "force_glsl_abs_sqrt");<br>
options->allow_glsl_cross_stage_interpolation_mismatch =<br>
driQueryOptionb(optionCache, "allow_glsl_cross_stage_interpolation_mismatch");<br>
options->allow_glsl_layout_qualifier_on_function_parameters =<br>
driQueryOptionb(optionCache, "allow_glsl_layout_qualifier_on_function_parameters");<br>
+ options->disable_L3_thread_pinning =<br>
+ driQueryOptionb(optionCache, "disable_L3_thread_pinning");<br>
<br>
driComputeOptionsSha1(optionCache, options->config_options_sha1);<br>
}<br>
<br>
static unsigned<br>
dri_loader_get_cap(struct dri_screen *screen, enum dri_loader_cap cap)<br>
{<br>
const __DRIdri2LoaderExtension *dri2_loader = screen->sPriv->dri2.loader;<br>
const __DRIimageLoaderExtension *image_loader = screen->sPriv->image.loader;<br>
<br>
diff --git a/src/mesa/state_tracker/st_context.c b/src/mesa/state_tracker/st_context.c<br>
index 354876746f4..4b19b140bcd 100644<br>
--- a/src/mesa/state_tracker/st_context.c<br>
+++ b/src/mesa/state_tracker/st_context.c<br>
@@ -460,20 +460,21 @@ st_create_context_priv(struct gl_context *ctx, struct pipe_context *pipe,<br>
screen->get_param(screen, PIPE_CAP_QUERY_TIME_ELAPSED);<br>
st->has_half_float_packing =<br>
screen->get_param(screen, PIPE_CAP_TGSI_PACK_HALF_FLOAT);<br>
st->has_multi_draw_indirect =<br>
screen->get_param(screen, PIPE_CAP_MULTI_DRAW_INDIRECT);<br>
<br>
st->has_hw_atomics =<br>
screen->get_shader_param(screen, PIPE_SHADER_FRAGMENT,<br>
PIPE_SHADER_CAP_MAX_HW_ATOMIC_COUNTERS)<br>
? true : false;<br>
+ st->disable_L3_thread_pinning = options->disable_L3_thread_pinning;<br>
<br>
util_throttle_init(&st->throttle,<br>
screen->get_param(screen,<br>
PIPE_CAP_MAX_TEXTURE_UPLOAD_MEMORY_BUDGET));<br>
<br>
/* GL limits and extensions */<br>
st_init_limits(pipe->screen, &ctx->Const, &ctx->Extensions, ctx->API);<br>
st_init_extensions(pipe->screen, &ctx->Const,<br>
&ctx->Extensions, &st->options, ctx->API);<br>
<br>
diff --git a/src/mesa/state_tracker/st_context.h b/src/mesa/state_tracker/st_context.h<br>
index 14b9b018809..e57873dafe8 100644<br>
--- a/src/mesa/state_tracker/st_context.h<br>
+++ b/src/mesa/state_tracker/st_context.h<br>
@@ -121,20 +121,21 @@ struct st_context<br>
boolean has_shader_model3;<br>
boolean has_etc1;<br>
boolean has_etc2;<br>
boolean has_astc_2d_ldr;<br>
boolean prefer_blit_based_texture_transfer;<br>
boolean force_persample_in_shader;<br>
boolean has_shareable_shaders;<br>
boolean has_half_float_packing;<br>
boolean has_multi_draw_indirect;<br>
boolean can_bind_const_buffer_as_vertex;<br>
+ boolean disable_L3_thread_pinning;<br>
<br>
/**<br>
* If a shader can be created when we get its source.<br>
* This means it has only 1 variant, not counting glBitmap and<br>
* glDrawPixels.<br>
*/<br>
boolean shader_has_one_variant[MESA_SHADER_STAGES];<br>
<br>
boolean needs_texcoord_semantic;<br>
boolean apply_texture_swizzle_to_border_color;<br>
diff --git a/src/mesa/state_tracker/st_manager.c b/src/mesa/state_tracker/st_manager.c<br>
index ceb48dd4903..eb0b88ef473 100644<br>
--- a/src/mesa/state_tracker/st_manager.c<br>
+++ b/src/mesa/state_tracker/st_manager.c<br>
@@ -1067,24 +1067,26 @@ st_api_make_current(struct st_api *stapi, struct st_context_iface *stctxi,<br>
<br>
/* Purge the context's winsys_buffers list in case any<br>
* of the referenced drawables no longer exist.<br>
*/<br>
st_framebuffers_purge(st);<br>
<br>
/* Notify the driver that the context thread may have been changed.<br>
* This should pin all driver threads to a specific L3 cache for optimal<br>
* performance on AMD Zen CPUs.<br>
*/<br>
- struct glthread_state *glthread = st->ctx->GLThread;<br>
- thrd_t *upper_thread = glthread ? &glthread->queue.threads[0] : NULL;<br>
+ if (!st->disable_L3_thread_pinning) {<br>
+ struct glthread_state *glthread = st->ctx->GLThread;<br>
+ thrd_t *upper_thread = glthread ? &glthread->queue.threads[0] : NULL;<br>
<br>
- util_context_thread_changed(st->pipe, upper_thread);<br>
+ util_context_thread_changed(st->pipe, upper_thread);<br>
+ }<br>
}<br>
else {<br>
ret = _mesa_make_current(NULL, NULL, NULL);<br>
}<br>
<br>
return ret;<br>
}<br>
<br>
<br>
static void<br>
diff --git a/src/util/00-mesa-defaults.conf b/src/util/00-mesa-defaults.conf<br>
index a937c46d052..e9a6b817d9a 100644<br>
--- a/src/util/00-mesa-defaults.conf<br>
+++ b/src/util/00-mesa-defaults.conf<br>
@@ -199,20 +199,24 @@ TODO: document the other workarounds.<br>
</application><br>
<br>
<application name="Wolfenstein The Old Blood" executable="WolfOldBlood_x64.exe"><br>
<option name="force_compat_profile" value="true" /><br>
</application><br>
<br>
<application name="ARMA 3" executable="arma3.x86_64"><br>
<option name="glsl_correct_derivatives_after_discard" value="true"/><br>
</application><br>
<br>
+ <application name="Blender" executable="blender"><br>
+ <option name="disable_L3_thread_pinning" value="true"/><br>
+ </application><br>
+<br>
<!-- The GL thread whitelist is below, workarounds are above.<br>
Keep it that way. --><br>
<br>
<application name="Alien Isolation" executable="AlienIsolation"><br>
<option name="mesa_glthread" value="true"/><br>
</application><br>
<br>
<application name="BioShock Infinite" executable="bioshock.i386"><br>
<option name="mesa_glthread" value="true"/><br>
</application><br>
diff --git a/src/util/xmlpool/t_options.h b/src/util/xmlpool/t_options.h<br>
index e0a30f5fd1d..5d916519794 100644<br>
--- a/src/util/xmlpool/t_options.h<br>
+++ b/src/util/xmlpool/t_options.h<br>
@@ -138,20 +138,25 @@ DRI_CONF_OPT_END<br>
#define DRI_CONF_ALLOW_GLSL_LAYOUT_QUALIFIER_ON_FUNCTION_PARAMETERS(def) \<br>
DRI_CONF_OPT_BEGIN_B(allow_glsl_layout_qualifier_on_function_parameters, def) \<br>
DRI_CONF_DESC(en,gettext("Allow layout qualifiers on function parameters.")) \<br>
DRI_CONF_OPT_END<br>
<br>
#define DRI_CONF_FORCE_COMPAT_PROFILE(def) \<br>
DRI_CONF_OPT_BEGIN_B(force_compat_profile, def) \<br>
DRI_CONF_DESC(en,gettext("Force an OpenGL compatibility context")) \<br>
DRI_CONF_OPT_END<br>
<br>
+#define DRI_CONF_DISABLE_L3_THREAD_PINNING(def) \<br>
+DRI_CONF_OPT_BEGIN_B(disable_L3_thread_pinning, def) \<br>
+ DRI_CONF_DESC(en,gettext("Disable L3 thread pinning.")) \<br>
+DRI_CONF_OPT_END<br>
+<br>
/**<br>
* \brief Image quality-related options<br>
*/<br>
#define DRI_CONF_SECTION_QUALITY \<br>
DRI_CONF_SECTION_BEGIN \<br>
DRI_CONF_DESC(en,gettext("Image Quality"))<br>
<br>
#define DRI_CONF_PRECISE_TRIG(def) \<br>
DRI_CONF_OPT_BEGIN_B(precise_trig, def) \<br>
DRI_CONF_DESC(en,gettext("Prefer accuracy over performance in trig functions")) \<br>
-- <br>
2.17.1<br>
<br>
_______________________________________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a><br>
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br>
</blockquote></div></div>