[Intel-gfx] i915: WARN_ON(val > dev_priv->rps.max_freq_softlimit)

O'Rourke, Tom Tom.O'Rourke at intel.com
Tue Feb 10 22:26:02 PST 2015


On Thu, Jan 29, 2015 at 08:56:06PM -0500, Michael Auchter wrote:
> On Thu, Jan 29, 2015 at 06:12:31PM +0100, Daniel Vetter wrote:
> > On Wed, Jan 28, 2015 at 10:36:02PM -0800, O'Rourke, Tom wrote:
> > > On Wed, Jan 28, 2015 at 01:28:58PM +0200, Ville Syrjälä wrote:
> > > > On Wed, Jan 28, 2015 at 09:58:15AM +0000, Chris Wilson wrote:
> > > > > On Wed, Jan 28, 2015 at 12:43:21AM -0500, Michael Auchter wrote:
> > > > > > Testing out 3.19-rc6 on my 2014 Thinkpad X1 Carbon (Haswell) resulted in
> > > > > > this WARN at boot (and pretty frequently afterwards):
> > > > > > 
> > > > > > WARNING: CPU: 0 PID: 989 at drivers/gpu/drm/i915/intel_pm.c:4377 gen6_set_rps+0x371/0x3c0()
> > > > > > WARN_ON(val > dev_priv->rps.max_freq_softlimit)
> > > > > 
> > > > > [snip]
> > > > >  
> > > > > > I'm not at all familiar with this hardware, but I took a quick look into
> > > > > > what changed with this commit for my laptop. Before the commit,
> > > > > > rps.min_freq_softlimit is 4 (from rps.min_freq) and
> > > > > > rps.max_freq_softlimit is 22.
> > > > > > 
> > > > > > After the commit, rps.min_freq_softlimit is set to the
> > > > > > rps.efficient_freq value read from pcode, which is 34 on my laptop.
> > > > > > So later when gen6_set_rps() is called with rps.min_freq_softlimit that
> > > > > > warning is hit.
> > > > > > 
> > > > > > Any thoughts? It certainly seems fishy that this commit causes
> > > > > > rps.min_freq_softlimit to be greater than rps.max_freq_softlimit.
> > > > > 
> > > > > Very fishy indeed. Moral of this story, never trust hw.
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > > > > index 3e630feb18e4..bbedd2901c54 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > > @@ -4007,7 +4007,10 @@ static void gen6_init_rps_frequencies(struct drm_device *dev)
> > > > >                                         &ddcc_status);
> > > > >                 if (0 == ret)
> > > > >                         dev_priv->rps.efficient_freq =
> > > > > -                               (ddcc_status >> 8) & 0xff;
> > > > > +                               clamp_t(u8,
> > > > > +                                       (ddcc_status >> 8) & 0xff,
> > > > > +                                       dev_priv->rps.min_freq,
> > > > > +                                       dev_priv->rps.max_freq);
> > > > 
> > > > Maybe better to fall back to rp1_freq if this is bogus?
> > > >
> > > [TOR:] Michael, Thank you for bringing this problem to our attention.
> > > 
> > > Yes, this function needs some range checking to maintain
> > > RPn <= RPe <= RP0.
> > > 
> > > A value of 34 seems too high for RPe.  
> > > What values does the Carbon X1 (Haswell) have for RPn and RP0?
> > 
> > 4 & 22, already in Micheal's original bug report.
> > 
> > Tom, can you pls polish the clamping into a proper patch with m-l
> > references?
> > 
> > Micheal, can you please test the first hunk from Chris (the one that adds
> > the clamp) to make sure it does indeed address the WARN_ON you're seeing?
> 
> The clamp suggested by Chris does indeed fix the WARN_ON.
> 
> In the case where RPe is greater than RP0, RPe will now be clamped to
> RP0. Is this likely to result in increased power consumption?
> 
> At a quick glance on my laptop it does not (idling around 5W before and
> after) but Ville had suggested earlier to fall back to RP1, which would
> be more consistent with previous kernels.
> 
> Thanks again for the quick responses,
>  Michael

[TOR:] Michael,  I discussed this report with a pcode architect here.

The RPe value is clamped to the [RPn, RP0] range by pcode before
returning the value to the driver on Broadwell but not on Haswell.

On Haswell, an efficient frequency value above RP0 is not a garbage
value and could result from a relatively flat efficiency curve.  In
this situation, leakage power would dominate the efficiency curve
such that running at lower frequencies may not save power overall.
Higher leakage power may result from a higher package temperature.

Running at RP0 may actually save power compared to RP1 by allowing
more time in RC6.  

Thanks,
Tom O'Rourke


More information about the Intel-gfx mailing list