[Intel-gfx] [PATCH] drm/i915: Avoid tweaking evaluation thresholds on Baytrail v2
Chris Wilson
chris at chris-wilson.co.uk
Wed Feb 15 12:52:26 UTC 2017
On Wed, Feb 15, 2017 at 02:37:50PM +0200, Mika Kuoppala wrote:
> Certain Baytrails, namely the 4 cpu core variants, have been
> plaqued by spurious system hangs, mostly occurring with light loads.
>
> Multiple bisects by various people point to a commit which changes the
> reclocking strategy for Baytrail to follow its bigger brethen:
> commit 8fb55197e64d ("drm/i915: Agressive downclocking on Baytrail")
>
> There is also a review comment attached to this commit from Deepak S
> on avoiding punit access on Cherryview and thus it was excluded on
> common reclocking path. By taking the same approach and omitting
> the punit access by not tweaking the thresholds when the hardware
> has been asked to move into different frequency, considerable gains
> in stability have been observed.
>
> With J1900 box, light render/video load would end up in system hang
> in usually less than 12 hours. With this patch applied, the cumulative
> uptime has now been 34 days without issues. To provoke system hang,
> light loads on both render and bsd engines in parallel have been used:
> glxgears >/dev/null 2>/dev/null &
> mpv --vo=vaapi --hwdec=vaapi --loop=inf vid.mp4
>
> So far, author has not witnessed system hang with above load
> and this patch applied. Reports from the tenacious people at
> kernel bugzilla are also promising.
>
> Considering that the punit access frequency with this patch is
> considerably less, there is a possibility that this will push
> the, still unknown, root cause past the triggering point on most loads.
>
> But as we now can reliably reproduce the hang independently,
> we can reduce the pain that users are having and use a
> static thresholds until a root cause is found.
>
> References: https://bugzilla.kernel.org/show_bug.cgi?id=109051
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> Cc: Len Brown <len.brown at intel.com>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> Cc: Jani Nikula <jani.nikula at intel.com>
> Cc: fritsch at xbmc.org
> Cc: miku at iki.fi
> Cc: Ezequiel Garcia <ezequiel at vanguardiasur.com.ar>
> CC: Michal Feix <michal at feix.cz>
> Cc: Hans de Goede <hdegoede at redhat.com>
> Cc: Deepak S <deepak.s at linux.intel.com>
> Cc: Jarkko Nikula <jarkko.nikula at linux.intel.com>
> Cc: <stable at vger.kernel.org> # v4.2+
> Acked-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> Signed-off-by: Mika Kuoppala <mika.kuoppala at intel.com>
> ---
> drivers/gpu/drm/i915/i915_irq.c | 4 ++--
> drivers/gpu/drm/i915/i915_reg.h | 2 ++
> drivers/gpu/drm/i915/intel_pm.c | 6 +++++-
> 3 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index a887aef..319c02d 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1095,7 +1095,7 @@ static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir)
> if (pm_iir & GEN6_PM_RP_DOWN_EI_EXPIRED) {
> if (!vlv_c0_above(dev_priv,
> &dev_priv->rps.down_ei, &now,
> - dev_priv->rps.down_threshold))
> + VLV_RP_DOWN_EI_THRESHOLD))
> events |= GEN6_PM_RP_DOWN_THRESHOLD;
> dev_priv->rps.down_ei = now;
> }
> @@ -1103,7 +1103,7 @@ static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir)
> if (pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) {
> if (vlv_c0_above(dev_priv,
> &dev_priv->rps.up_ei, &now,
> - dev_priv->rps.up_threshold))
> + VLV_RP_UP_EI_THRESHOLD))
A patch to set them as we set the default values during rps enable so
that you don't break the debug interfaces.
> events |= GEN6_PM_RP_UP_THRESHOLD;
> dev_priv->rps.up_ei = now;
> }
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 141a5c1..1297f6a 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -1135,6 +1135,8 @@ enum skl_disp_power_wells {
> #define CHV_BIAS_CPU_50_SOC_50 (3 << 2)
>
> #define VLV_CZ_CLOCK_TO_MILLI_SEC 100000
> +#define VLV_RP_UP_EI_THRESHOLD 90
> +#define VLV_RP_DOWN_EI_THRESHOLD 70
>
> /* vlv2 north clock has */
> #define CCK_FUSE_REG 0x8
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 3d311e1..bce6aae 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -4971,7 +4971,11 @@ static int valleyview_set_rps(struct drm_i915_private *dev_priv, u8 val)
> if (err)
> return err;
>
> - gen6_set_rps_thresholds(dev_priv, val);
> + /* When byt can survive without system hang with dynamic
> + * sw freq adjustments, this restriction can be lifted.
> + */
> + if (!IS_VALLEYVIEW(dev_priv))
Are all vlv affected?
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list