[Mesa-dev] [PATCH] i965: Split shader_time entries into separate cachelines.
Kenneth Graunke
kenneth at whitecape.org
Wed Mar 13 20:09:01 PDT 2013
On 03/13/2013 10:45 AM, Eric Anholt wrote:
> This avoids some snooping overhead between EUs processing separate shaders
> (so VS versus FS).
Plausible!
> Improves performance of a minecraft trace with shader_time by 28.9% +/-
> 18.3% (n=7), and performance of my old GLSL demo by 93.7% +/- 0.8% (n=4).
"+/- 18.3%"...lol. Still, nice improvement. This should make the tool
much nicer to use.
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
In case I forgot, all shader_time patches you've sent so far get a R-b.
I remember reading them, just not whether I acked them.
> v2: Add a define for the stride with a comment explaining its units and
> why.
> ---
> src/mesa/drivers/dri/i965/brw_context.h | 8 ++++++++
> src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +-
> src/mesa/drivers/dri/i965/brw_program.c | 5 +++--
> src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +-
> 4 files changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
> index c34d6b1..d042dd6 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -571,6 +571,14 @@ struct brw_vs_prog_data {
> #define SURF_INDEX_SOL_BINDING(t) ((t))
> #define BRW_MAX_GS_SURFACES SURF_INDEX_SOL_BINDING(BRW_MAX_SOL_BINDINGS)
>
> +/**
> + * Stride in bytes between shader_time entries.
> + *
> + * We separate entries by a cacheline to reduce traffic between EUs writing to
> + * different entries.
> + */
> +#define SHADER_TIME_STRIDE 64
> +
> enum brw_cache_id {
> BRW_BLEND_STATE,
> BRW_DEPTH_STENCIL_STATE,
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 8ce3954..8476bb5 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -621,7 +621,7 @@ fs_visitor::emit_shader_time_write(enum shader_time_shader_type type,
>
> fs_reg offset_mrf = fs_reg(MRF, base_mrf);
> offset_mrf.type = BRW_REGISTER_TYPE_UD;
> - emit(MOV(offset_mrf, fs_reg(shader_time_index * 4)));
> + emit(MOV(offset_mrf, fs_reg(shader_time_index * SHADER_TIME_STRIDE)));
>
> fs_reg time_mrf = fs_reg(MRF, base_mrf + 1);
> time_mrf.type = BRW_REGISTER_TYPE_UD;
> diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c
> index 75eb6bc..62954d3 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.c
> +++ b/src/mesa/drivers/dri/i965/brw_program.c
> @@ -228,7 +228,8 @@ brw_init_shader_time(struct brw_context *brw)
>
> const int max_entries = 4096;
> brw->shader_time.bo = drm_intel_bo_alloc(intel->bufmgr, "shader time",
> - max_entries * 4, 4096);
> + max_entries * SHADER_TIME_STRIDE,
> + 4096);
> brw->shader_time.programs = rzalloc_array(brw, struct gl_shader_program *,
> max_entries);
> brw->shader_time.types = rzalloc_array(brw, enum shader_time_shader_type,
> @@ -409,7 +410,7 @@ brw_collect_shader_time(struct brw_context *brw)
> uint32_t *times = brw->shader_time.bo->virtual;
>
> for (int i = 0; i < brw->shader_time.num_entries; i++) {
> - brw->shader_time.cumulative[i] += times[i];
> + brw->shader_time.cumulative[i] += times[i * SHADER_TIME_STRIDE / 4];
> }
>
> /* Zero the BO out to clear it out for our next collection.
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index f319f32..d759710 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1225,7 +1225,7 @@ vec4_visitor::emit_shader_time_write(enum shader_time_shader_type type,
>
> dst_reg offset_mrf = dst_reg(MRF, base_mrf);
> offset_mrf.type = BRW_REGISTER_TYPE_UD;
> - emit(MOV(offset_mrf, src_reg(shader_time_index * 4)));
> + emit(MOV(offset_mrf, src_reg(shader_time_index * SHADER_TIME_STRIDE)));
>
> dst_reg time_mrf = dst_reg(MRF, base_mrf + 1);
> time_mrf.type = BRW_REGISTER_TYPE_UD;
More information about the mesa-dev
mailing list