[PATCH] drm/i915: Add special execution time measurements for pipe evasion section

Jani Nikula jani.nikula at intel.com
Tue Aug 8 13:03:31 UTC 2023


On Tue, 08 Aug 2023, "Lisovskiy, Stanislav" <stanislav.lisovskiy at intel.com> wrote:
> On Mon, Aug 07, 2023 at 01:30:06PM +0300, Jani Nikula wrote:
>> On Mon, 07 Aug 2023, Stanislav Lisovskiy <stanislav.lisovskiy at intel.com> wrote:
>> > We are getting "pipe evasion exceeded 250 us" on constant basis for different
>> > platforms for many years. Current tracing instrumentation, shows us only
>> > aggregate execution times for the functions in evasion section, however we are
>> > only and particularly interested in those _only_ once this timing is exceeded,
>> > average timing measurements or those which happen when we don't exceed that
>> > 250 us threshold are not informative at all.
>> > This patch attemmpts to add a special debug print instrumentation, that prints
>> > all the execution time measurements of functions participating in vblank evasion
>> > only once it is exceeded and remain silent otherwise.
>> > It shouldn't also add much overhead because all it does is just calling ktime_get
>> > before and after function call.
>> 
>> I see that this has been sent for trybot, but just so there's no
>> confusion: This is fine for debugging, but not for merging upstream.
>> 
>> BR,
>> Jani.
>
> Hi Jani,
>
> Do you see any value in adding such debug prints or is it about implementation?
> My point is that it could be beneficial for us to investigate such problems, to 
> have some kind of debug prints when we get execution times for all the functions
> exactly when debug evasion time is exceeded, this could speed up a lot our investigation.
> I'm ready to fix whatever needed, if it is just about implementation(also could be
> that may be you know some better tooling or easier way to implement this?).

I'd think ftrace is the tooling to use.

Of course with that you'd have to set it up and try to reproduce the
situation, doesn't work for catching random issues in CI for example.

The debug prints might be useful in some cases, but it would have to be
*much* less intrusive in code and implementation. Can't reduce the
readability of the code this much.

Perhaps a better pattern to follow would be the with_intel_runtime_pm()
model:

	with_time_something(some_context) {
		/* stuff you want to measure */
	}

But I'm still hesitant to encourage you to work on this. We have so many
things in i915 that should not be i915 specific, and should rather be
bog standard tools and utilities in general use.

Some inline notes below.

>
> Stan
>
>> 
>> >
>> > Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy at intel.com>
>> > ---
>> >  .../gpu/drm/i915/display/intel_atomic_plane.c | 22 +++++---
>> >  drivers/gpu/drm/i915/display/intel_crtc.c     |  5 ++
>> >  drivers/gpu/drm/i915/display/intel_display.c  | 51 +++++++++++++++----
>> >  drivers/gpu/drm/i915/display/intel_display.h  | 49 ++++++++++++++++++
>> >  4 files changed, 111 insertions(+), 16 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/display/intel_atomic_plane.c b/drivers/gpu/drm/i915/display/intel_atomic_plane.c
>> > index 60a492e186ab8..72abca8321a62 100644
>> > --- a/drivers/gpu/drm/i915/display/intel_atomic_plane.c
>> > +++ b/drivers/gpu/drm/i915/display/intel_atomic_plane.c
>> > @@ -841,19 +841,27 @@ static void skl_crtc_planes_update_arm(struct intel_atomic_state *state,
>> >  	memcpy(ddb_y, old_crtc_state->wm.skl.plane_ddb_y,
>> >  	       sizeof(old_crtc_state->wm.skl.plane_ddb_y));
>> >  
>> > -	while ((plane = skl_next_plane_to_commit(state, crtc, ddb, ddb_y, &update_mask))) {
>> > -		struct intel_plane_state *new_plane_state =
>> > -			intel_atomic_get_new_plane_state(state, plane);
>> > +	while (1) {
>> > +		struct intel_plane_state *new_plane_state;
>> > +
>> > +		TIME_MEASURE_FUNC_WRAP(SKL_NEXT_PLANE_TO_COMMIT, pipe_evasion,
>> > +				       plane = skl_next_plane_to_commit(state, crtc, ddb, ddb_y, &update_mask));

This is what I mean. If you're not familiar with the highly specific
macro, you are thrown off, and need to look it up to know what's going
on. Assignment hidden in the macro argument in particular is bad for
readability. Automated tools such as coccinelle are likely to stumble on
this.

Side note, isn't it very unlikely for skl_next_plane_to_commit() to take
any significant amount of time?

>> > +
>> > +		if (!plane)
>> > +			break;
>> > +
>> > +		new_plane_state = intel_atomic_get_new_plane_state(state, plane);
>> >  
>> >  		/*
>> >  		 * TODO: for mailbox updates intel_plane_update_noarm()
>> >  		 * would have to be called here as well.
>> >  		 */
>> >  		if (new_plane_state->uapi.visible ||
>> > -		    new_plane_state->planar_slave)
>> > -			intel_plane_update_arm(plane, new_crtc_state, new_plane_state);
>> > -		else
>> > -			intel_plane_disable_arm(plane, new_crtc_state);
>> > +		    new_plane_state->planar_slave) {
>> > +			TIME_MEASURE_FUNC_WRAP(INTEL_UPDATE_PLANE, pipe_evasion, intel_plane_update_arm(plane, new_crtc_state, new_plane_state));
>> > +		} else {
>> > +			TIME_MEASURE_FUNC_WRAP(INTEL_DISABLE_PLANE, pipe_evasion, intel_plane_disable_arm(plane, new_crtc_state));
>> > +		}
>> >  	}
>> >  }
>> >  
>> > diff --git a/drivers/gpu/drm/i915/display/intel_crtc.c b/drivers/gpu/drm/i915/display/intel_crtc.c
>> > index 182c6dd64f47c..7bad7c8544b7e 100644
>> > --- a/drivers/gpu/drm/i915/display/intel_crtc.c
>> > +++ b/drivers/gpu/drm/i915/display/intel_crtc.c
>> > @@ -617,11 +617,16 @@ static void dbg_vblank_evade(struct intel_crtc *crtc, ktime_t end)
>> >  		crtc->debug.vbl.max = delta;
>> >  
>> >  	if (delta > 1000 * VBLANK_EVASION_TIME_US) {
>> > +		struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
>> > +
>> >  		drm_dbg_kms(crtc->base.dev,
>> >  			    "Atomic update on pipe (%c) took %lld us, max time under evasion is %u us\n",
>> >  			    pipe_name(crtc->pipe),
>> >  			    div_u64(delta, 1000),
>> >  			    VBLANK_EVASION_TIME_US);
>> > +
>> > +		print_time_measure_table(dev_priv, pipe_evasion, PIPE_EVASION_ENUM_MAX);
>> > +
>> >  		crtc->debug.vbl.over++;
>> >  	}
>> >  }
>> > diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
>> > index 43cba98f77531..b35dc3704725d 100644
>> > --- a/drivers/gpu/drm/i915/display/intel_display.c
>> > +++ b/drivers/gpu/drm/i915/display/intel_display.c
>> > @@ -6486,6 +6486,30 @@ static void intel_pipe_fastset(const struct intel_crtc_state *old_crtc_state,
>> >  					       &new_crtc_state->dp_m_n);
>> >  }
>> >  
>> > +TIME_MEASURE_FUNC_TABLE_START(pipe_evasion)
>> > +	TIME_MEASURE_FUNC(INTEL_COLOR_COMMIT_ARM, "intel_color_commit_arm"),
>> > +	TIME_MEASURE_FUNC(BDW_SET_PIPE_MISC, "bdw_set_pipe_misc"),
>> > +	TIME_MEASURE_FUNC(INTEL_PIPE_FASTSET, "intel_pipe_fastset"),
>> > +	TIME_MEASURE_FUNC(INTEL_PSR_PROGRAM_TRANS_MAN_TRK_CTL, "intel_psr_program_trans_maxn_trk_ctl"),
>> > +	TIME_MEASURE_FUNC(INTEL_ATOMIC_UPDATE_WATERMARKS, "intel_atomic_update_watermarks"),
>> > +	TIME_MEASURE_FUNC(SKL_DETACH_SCALERS, "skl_detach_scalers"),
>> > +	TIME_MEASURE_FUNC(SKL_NEXT_PLANE_TO_COMMIT, "skl_next_plane_to_commit"),
>> > +	TIME_MEASURE_FUNC(INTEL_UPDATE_PLANE, "intel_update_plane"),
>> > +	TIME_MEASURE_FUNC(INTEL_DISABLE_PLANE, "intel_disable_plane"),

Many of the above just read/write registers. But I wonder how long debug
logging the watermark changes might take, worst case, for instance.

>> > +TIME_MEASURE_FUNC_TABLE_END

I don't understand why the start/end would need dedicated macros.

Anyway, the table is shared and trampled over by all devices. Unless
you're careful, it's also racy and/or inaccurate in a single device
setting too.

For example, in the while (1) loop in skl_crtc_planes_update_arm(), what
if the first iteration of the loop is slow and the second is fast? This
just overwrites the data for the slow iteration.

>> > +
>> > +void print_time_measure_table(struct drm_i915_private *dev_priv,
>> > +			      struct time_measure_func *table, int num_entries)
>> > +{
>> > +	int i;
>> > +	for (i = 0; i < num_entries; i++)
>> > +	{
>> > +		drm_dbg_kms(&dev_priv->drm,
>> > +			    "Function %s took %lld us\n",
>> > +			    table[i].name, ktime_us_delta(table[i].end, table[i].start));
>> > +	}
>> > +}
>> > +
>> >  static void commit_pipe_pre_planes(struct intel_atomic_state *state,
>> >  				   struct intel_crtc *crtc)
>> >  {
>> > @@ -6501,19 +6525,27 @@ static void commit_pipe_pre_planes(struct intel_atomic_state *state,
>> >  	 * CRTC was enabled.
>> >  	 */
>> >  	if (!modeset) {
>> > -		if (intel_crtc_needs_color_update(new_crtc_state))
>> > -			intel_color_commit_arm(new_crtc_state);
>> > +		if (intel_crtc_needs_color_update(new_crtc_state)) {
>> > +			TIME_MEASURE_FUNC_WRAP(INTEL_COLOR_COMMIT_ARM,
>> > +				pipe_evasion, intel_color_commit_arm(new_crtc_state));
>> > +		}
>> >  
>> > -		if (DISPLAY_VER(dev_priv) >= 9 || IS_BROADWELL(dev_priv))
>> > -			bdw_set_pipe_misc(new_crtc_state);
>> > +		if (DISPLAY_VER(dev_priv) >= 9 || IS_BROADWELL(dev_priv)) {
>> > +			TIME_MEASURE_FUNC_WRAP(BDW_SET_PIPE_MISC,
>> > +				pipe_evasion, bdw_set_pipe_misc(new_crtc_state));
>> > +		}
>> >  
>> > -		if (intel_crtc_needs_fastset(new_crtc_state))
>> > -			intel_pipe_fastset(old_crtc_state, new_crtc_state);
>> > +		if (intel_crtc_needs_fastset(new_crtc_state)) {
>> > +			TIME_MEASURE_FUNC_WRAP(INTEL_PIPE_FASTSET,
>> > +				pipe_evasion, intel_pipe_fastset(old_crtc_state, new_crtc_state));
>> > +		}
>> >  	}
>> >  
>> > -	intel_psr2_program_trans_man_trk_ctl(new_crtc_state);
>> > +	TIME_MEASURE_FUNC_WRAP(INTEL_PSR_PROGRAM_TRANS_MAN_TRK_CTL,
>> > +		pipe_evasion, intel_psr2_program_trans_man_trk_ctl(new_crtc_state));
>> >  
>> > -	intel_atomic_update_watermarks(state, crtc);
>> > +	TIME_MEASURE_FUNC_WRAP(INTEL_ATOMIC_UPDATE_WATERMARKS,
>> > +		pipe_evasion, intel_atomic_update_watermarks(state, crtc));
>> >  }
>> >  
>> >  static void commit_pipe_post_planes(struct intel_atomic_state *state,
>> > @@ -6530,7 +6562,8 @@ static void commit_pipe_post_planes(struct intel_atomic_state *state,
>> >  	 */
>> >  	if (DISPLAY_VER(dev_priv) >= 9 &&
>> >  	    !intel_crtc_needs_modeset(new_crtc_state))
>> > -		skl_detach_scalers(new_crtc_state);
>> > +		TIME_MEASURE_FUNC_WRAP(SKL_DETACH_SCALERS,
>> > +			pipe_evasion, skl_detach_scalers(new_crtc_state));
>> >  }
>> >  
>> >  static void intel_enable_crtc(struct intel_atomic_state *state,
>> > diff --git a/drivers/gpu/drm/i915/display/intel_display.h b/drivers/gpu/drm/i915/display/intel_display.h
>> > index 49ac8473b988b..31bd34e2f253c 100644
>> > --- a/drivers/gpu/drm/i915/display/intel_display.h
>> > +++ b/drivers/gpu/drm/i915/display/intel_display.h
>> > @@ -531,8 +531,57 @@ enum drm_mode_status intel_mode_valid(struct drm_device *dev,
>> >  int intel_atomic_commit(struct drm_device *dev, struct drm_atomic_state *_state,
>> >  			bool nonblock);
>> >  
>> > +
>> > +struct time_measure_func
>> > +{
>> > +	int id;
>> > +	const char *name;
>> > +	ktime_t start;
>> > +	ktime_t end;
>> > +};
>> > +
>> > +#define TIME_MEASURE_ENUM_START(enum_name) \
>> > +			enum enum_name {
>> > +#define TIME_MEASURE_ENUM_END };
>> > +
>> > +#define TIME_MEASURE_FUNC_TABLE_START(table_name) \
>> > +			struct time_measure_func table_name[] = {
>> > +
>> > +#define TIME_MEASURE_FUNC_TABLE_DECLARE(table_name) \
>> > +			extern struct time_measure_func table_name[];
>> > +
>> > +#define TIME_MEASURE_FUNC(_id, _name) \
>> > +			{ .id = _id, .name = _name }
>> > +
>> > +#define TIME_MEASURE_FUNC_TABLE_END };
>> > +
>> > +#define TIME_MEASURE_FUNC_WRAP(id, table_name, func_call) \
>> > +	{ \
>> > +		table_name[id].start = ktime_get(); \
>> > +		func_call; \
>> > +		table_name[id].end = ktime_get(); \
>> > +	}
>> > +
>> > +TIME_MEASURE_ENUM_START(pipe_evasion_enum)
>> > +	INTEL_COLOR_COMMIT_ARM,
>> > +	BDW_SET_PIPE_MISC,
>> > +	INTEL_PIPE_FASTSET,
>> > +	INTEL_PSR_PROGRAM_TRANS_MAN_TRK_CTL,
>> > +	INTEL_ATOMIC_UPDATE_WATERMARKS,
>> > +	SKL_DETACH_SCALERS,
>> > +	SKL_NEXT_PLANE_TO_COMMIT,
>> > +	INTEL_UPDATE_PLANE,
>> > +	INTEL_DISABLE_PLANE,
>> > +	PIPE_EVASION_ENUM_MAX
>> > +TIME_MEASURE_ENUM_END

It's odd to have macros defining the start/end of an enum.

>> > +
>> > +TIME_MEASURE_FUNC_TABLE_DECLARE(pipe_evasion);
>> > +
>> >  void intel_hpd_poll_fini(struct drm_i915_private *i915);
>> >  
>> > +void print_time_measure_table(struct drm_i915_private *i915,
>> > +			      struct time_measure_func *table, int num_entries);
>> > +
>> >  /* modesetting asserts */
>> >  void assert_transcoder(struct drm_i915_private *dev_priv,
>> >  		       enum transcoder cpu_transcoder, bool state);
>> 
>> -- 
>> Jani Nikula, Intel Open Source Graphics Center

-- 
Jani Nikula, Intel Open Source Graphics Center


More information about the Intel-gfx-trybot mailing list