[Intel-gfx] [PATCH] drm/i915/byt: Avoid tweaking evaluation thresholds

Chris Wilson chris at chris-wilson.co.uk
Wed Jan 25 13:17:20 UTC 2017


On Wed, Jan 25, 2017 at 03:09:04PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > On Wed, Jan 25, 2017 at 02:31:08PM +0200, Mika Kuoppala wrote:
> >> Certain Baytrails, namely the 4 cpu core variants, have been
> >> plaqued by spurious system hangs, mostly occurring with light loads.
> >> 
> >> Multiple bisects by various people point to a commit which changes the
> >> reclocking strategy for Baytrail to follow its bigger brethen:
> >> commit 8fb55197e64d ("drm/i915: Agressive downclocking on Baytrail")
> >> 
> >> There is also a review comment attached to this commit from Deepak S
> >> on avoiding punit access on Cherryview and thus it is excluded on
> >> common reclocking path. By taking the same approach and omitting
> >> the punit access by not tweaking the thresholds when the hardware
> >> has been asked to move into different frequency, considerable gains
> >> in stability have been observed.
> >> 
> >> With J1900 box, light render/video load would end up in system hang
> >> in usually less than 12 hours. With this patch applied, the cumulative
> >> uptime has now been 34 days without issues. To provoke system hang,
> >> light loads on both render and bsd engines in parallel have been used:
> >> glxgears >/dev/null 2>/dev/null &
> >> mpv --vo=vaapi --hwdec=vaapi --loop=inf vid.mp4
> >> 
> >> So far, author has not witnessed system hang with above load
> >> and this patch applied. Reports from the tenacious people at
> >> kernel bugzilla are also promising.
> >> 
> >> Considering that the punit access frequency with this patch is
> >> considerably less, there is a possibility that this will push
> >> the, still unknown, root cause past the triggering point on most loads.
> >> Further work on investigating the punit accesses on byt is welcomed.
> >
> > Please find the underlying problem and not disabling rps for all vlv
> > for a GT specific problem.
> 
> This is not disabling rps.

Your are disabling the key ingredients of the algorithm, making it less
generic in order to workaround a problem elsewhere. You are tackling the
symptoms and not the cause.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list