[Intel-gfx] [RESEND] drm/i915: Interactive RPS mode

Fri Jul 20 12:49:09 UTC 2018

On 12/07/2018 18:38, Chris Wilson wrote:
> RPS provides a feedback loop where we use the load during the previous
> evaluation interval to decide whether to up or down clock the GPU
> frequency. Our responsiveness is split into 3 regimes, a high and low
> plateau with the intent to keep the gpu clocked high to cover occasional
> stalls under high load, and low despite occasional glitches under steady
> low load, and inbetween. However, we run into situations like kodi where
> we want to stay at low power (video decoding is done efficiently
> inside the fixed function HW and doesn't need high clocks even for high
> bitrate streams), but just occasionally the pipeline is more complex
> than a video decode and we need a smidgen of extra GPU power to present
> on time. In the high power regime, we sample at sub frame intervals with
> a bias to upclocking, and conversely at low power we sample over a few
> frames worth to provide what we consider to be the right levels of
> responsiveness respectively. At low power, we more or less expect to be
> kicked out to high power at the start of a busy sequence by waitboosting.
> 
> Prior to commit e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active
> request") whenever we missed the frame or stalled, we would immediate go
> full throttle and upclock the GPU to max. But in commit e9af4ea2b9e7, we
> relaxed the waitboosting to only apply if the pipeline was deep to avoid
> over-committing resources for a near miss. Sadly though, a near miss is
> still a miss, and perceptible as jitter in the frame delivery.
> 
> To try and prevent the near miss before having to resort to boosting
> after the fact, we use the pageflip queue as an indication that we are
> in an "interactive" regime and so should sample the load more frequently
> to provide power before the frame misses it vblank. This will make us
> more favorable to providing a small power increase (one or two bins) as
> required rather than going all the way to maximum and then having to
> work back down again. (We still keep the waitboosting mechanism around
> just in case a dramatic change in system load requires urgent uplocking,
> faster than we can provide in a few evaluation intervals.)
> 
> v2: Reduce rps_set_interactive to a boolean parameter to avoid the
> confusion of what if they wanted a new power mode after pinning to a
> different mode (which to choose?)
> v3: Only reprogram RPS while the GT is awake, it will be set when we
> wake the GT, and while off warns about being used outside of rpm.
> v4: Fix deferred application of interactive mode
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107111
> Fixes: e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active request")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg at intel.com>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c  |  1 +
>   drivers/gpu/drm/i915/i915_drv.h      |  4 ++
>   drivers/gpu/drm/i915/intel_display.c | 20 ++++++
>   drivers/gpu/drm/i915/intel_drv.h     |  2 +
>   drivers/gpu/drm/i915/intel_pm.c      | 91 +++++++++++++++++++---------
>   5 files changed, 89 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 099f97ef2303..ac019bb927d0 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2218,6 +2218,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
>   	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
>   	seq_printf(m, "Boosts outstanding? %d\n",
>   		   atomic_read(&rps->num_waiters));
> +	seq_printf(m, "Interactive? %d\n", READ_ONCE(rps->interactive));
>   	seq_printf(m, "Frequency requested %d\n",
>   		   intel_gpu_freq(dev_priv, rps->cur_freq));
>   	seq_printf(m, "  min hard:%d, soft:%d; max soft:%d, hard:%d\n",
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 01dd29837233..f02fbeee553f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -784,6 +784,8 @@ struct intel_rps {
>   
>   	int last_adj;
>   	enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
> +	unsigned int interactive;
> +	struct mutex power_lock;
>   
>   	bool enabled;
>   	atomic_t num_waiters;
> @@ -3429,6 +3431,8 @@ extern void i915_redisable_vga_power_on(struct drm_i915_private *dev_priv);
>   extern bool ironlake_set_drps(struct drm_i915_private *dev_priv, u8 val);
>   extern void intel_init_pch_refclk(struct drm_i915_private *dev_priv);
>   extern int intel_set_rps(struct drm_i915_private *dev_priv, u8 val);
> +extern void intel_rps_set_interactive(struct drm_i915_private *dev_priv,
> +				      bool state);
>   extern bool intel_set_memory_cxsr(struct drm_i915_private *dev_priv,
>   				  bool enable);
>   
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 7998e70a3174..5809366ff9f0 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -13104,6 +13104,19 @@ intel_prepare_plane_fb(struct drm_plane *plane,
>   		add_rps_boost_after_vblank(new_state->crtc, new_state->fence);
>   	}
>   
> +	/*
> +	 * We declare pageflips to be interactive and so merit a small bias
> +	 * towards upclocking to deliver the frame on time. By only changing
> +	 * the RPS thresholds to sample more regularly and aim for higher
> +	 * clocks we can hopefully deliver low power workloads (like kodi)
> +	 * that are not quite steady state without resorting to forcing
> +	 * maximum clocks following a vblank miss (see do_rps_boost()).
> +	 */
> +	if (!intel_state->rps_interactive) {
> +		intel_rps_set_interactive(dev_priv, true);
> +		intel_state->rps_interactive = true;
> +	}
> +
>   	return 0;
>   }
>   
> @@ -13120,8 +13133,15 @@ void
>   intel_cleanup_plane_fb(struct drm_plane *plane,
>   		       struct drm_plane_state *old_state)
>   {
> +	struct intel_atomic_state *intel_state =
> +		to_intel_atomic_state(old_state->state);
>   	struct drm_i915_private *dev_priv = to_i915(plane->dev);
>   
> +	if (intel_state->rps_interactive) {
> +		intel_rps_set_interactive(dev_priv, false);
> +		intel_state->rps_interactive = false;
> +	}

Why is the rps_intearctive flag needed in plane state? I wanted to 
assume prepare & cleanup hooks are fully symmetric, but this flags makes 
me unsure. A reviewer with more display knowledge will be needed here I 
think.

> +
>   	/* Should only be called after a successful intel_prepare_plane_fb()! */
>   	mutex_lock(&dev_priv->drm.struct_mutex);
>   	intel_plane_unpin_fb(to_intel_plane_state(old_state));
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 61e715ddd0d5..544812488821 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -482,6 +482,8 @@ struct intel_atomic_state {
>   	 */
>   	bool skip_intermediate_wm;
>   
> +	bool rps_interactive;

Should the name at this level be something more tied to the subsystem 
and not implying wider relationships? Like page flip pending? fb_prepared?

> +
>   	/* Gen9+ only */
>   	struct skl_ddb_values wm_results;
>   
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 53aaaa3e6886..01475bf3196a 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -6264,41 +6264,14 @@ static u32 intel_rps_limits(struct drm_i915_private *dev_priv, u8 val)
>   	return limits;
>   }
>   
> -static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
> +static void rps_set_power(struct drm_i915_private *dev_priv, int new_power)
>   {
>   	struct intel_rps *rps = &dev_priv->gt_pm.rps;
> -	int new_power;
>   	u32 threshold_up = 0, threshold_down = 0; /* in % */
>   	u32 ei_up = 0, ei_down = 0;
>   
> -	new_power = rps->power;
> -	switch (rps->power) {
> -	case LOW_POWER:
> -		if (val > rps->efficient_freq + 1 &&
> -		    val > rps->cur_freq)
> -			new_power = BETWEEN;
> -		break;
> +	lockdep_assert_held(&rps->power_lock);
>   
> -	case BETWEEN:
> -		if (val <= rps->efficient_freq &&
> -		    val < rps->cur_freq)
> -			new_power = LOW_POWER;
> -		else if (val >= rps->rp0_freq &&
> -			 val > rps->cur_freq)
> -			new_power = HIGH_POWER;
> -		break;
> -
> -	case HIGH_POWER:
> -		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
> -		    val < rps->cur_freq)
> -			new_power = BETWEEN;
> -		break;
> -	}
> -	/* Max/min bins are special */
> -	if (val <= rps->min_freq_softlimit)
> -		new_power = LOW_POWER;
> -	if (val >= rps->max_freq_softlimit)
> -		new_power = HIGH_POWER;
>   	if (new_power == rps->power)
>   		return;
>   
> @@ -6365,9 +6338,68 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
>   	rps->power = new_power;
>   	rps->up_threshold = threshold_up;
>   	rps->down_threshold = threshold_down;
> +}
> +
> +static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
> +{
> +	struct intel_rps *rps = &dev_priv->gt_pm.rps;
> +	int new_power;
> +
> +	new_power = rps->power;
> +	switch (rps->power) {
> +	case LOW_POWER:
> +		if (val > rps->efficient_freq + 1 &&
> +		    val > rps->cur_freq)
> +			new_power = BETWEEN;
> +		break;
> +
> +	case BETWEEN:
> +		if (val <= rps->efficient_freq &&
> +		    val < rps->cur_freq)
> +			new_power = LOW_POWER;
> +		else if (val >= rps->rp0_freq &&
> +			 val > rps->cur_freq)
> +			new_power = HIGH_POWER;
> +		break;
> +
> +	case HIGH_POWER:
> +		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
> +		    val < rps->cur_freq)
> +			new_power = BETWEEN;
> +		break;
> +	}
> +	/* Max/min bins are special */
> +	if (val <= rps->min_freq_softlimit)
> +		new_power = LOW_POWER;
> +	if (val >= rps->max_freq_softlimit)
> +		new_power = HIGH_POWER;
> +
> +	mutex_lock(&rps->power_lock);

Is it worth avoiding the atomic here when GEN < 6?

> +	if (rps->interactive)
> +		new_power = HIGH_POWER;
> +	rps_set_power(dev_priv, new_power);
> +	mutex_unlock(&rps->power_lock);
 >
 >   	rps->last_adj = 0;

This block should go up to the beginning of the function since when 
rps->interactive all preceding calculations are for nothing. And can 
immediately return then.

>   }
>   
> +void intel_rps_set_interactive(struct drm_i915_private *dev_priv, bool state)

s/state/interactive/ for more self-documenting function body?

And not s/dev_priv/i915/ ?!? :)

> +{
> +	struct intel_rps *rps = &dev_priv->gt_pm.rps;
> +
> +	if (INTEL_GEN(dev_priv) < 6)
> +		return;
> +
> +	mutex_lock(&rps->power_lock);
> +	if (state) {
> +		if (!rps->interactive++ && READ_ONCE(dev_priv->gt.awake))
> +			rps_set_power(dev_priv, HIGH_POWER);
> +	} else {
> +		GEM_BUG_ON(!rps->interactive);
> +		rps->interactive--;

Hm maybe protect this in production code so some missed code flows in 
the atomic paths do not cause underflow and so permanent interactive mode.

> +	}
> +	mutex_unlock(&rps->power_lock);
> +}
> +
>   static u32 gen6_rps_pm_mask(struct drm_i915_private *dev_priv, u8 val)
>   {
>   	struct intel_rps *rps = &dev_priv->gt_pm.rps;
> @@ -9604,6 +9636,7 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val)
>   void intel_pm_setup(struct drm_i915_private *dev_priv)
>   {
>   	mutex_init(&dev_priv->pcu_lock);
> +	mutex_init(&dev_priv->gt_pm.rps.power_lock);
>   
>   	atomic_set(&dev_priv->gt_pm.rps.num_waiters, 0);
>   
> 

Regards,

Tvrtko