[Intel-gfx] [PATCH] drm/i915: run intel_uncore_early_sanitize earlier on resume on non-VLV

Wed Oct 22 13:20:49 CEST 2014

On Tue, 2014-10-21 at 19:05 +0200, Daniel Vetter wrote:
> On Mon, Oct 20, 2014 at 01:20:50PM +0300, Imre Deak wrote:
> > On Fri, 2014-10-17 at 16:01 -0300, Paulo Zanoni wrote:
> > > From: Paulo Zanoni <paulo.r.zanoni at intel.com>
> > > 
> > > As far as I understand, intel_uncore_early_sanitize() was supposed to
> > > be ran before any register access, but currently
> > > intel_resume_prepare() is ran earlier, and it does register
> > > access. I don't think it should be safe to be calling
> > > I915_{READ,WRITE} without calling intel_uncore_early_sanitize() first.
> > > 
> > > One of the problems we currently have is that when we suspend/resume
> > > BDW, the FPGA_DBG_RM_NOCLAIM bit becomes 1, so we end up printing an
> > > "unclaimed register" message on resume, but this message doesn't
> > > really seem to have been triggered by our driver or user space, since
> > > the bit was not there before suspending, and gets there just after
> > > resuming, before any of our own register accesses. So calling
> > > intel_uncore_early_sanitize() as a first thing will allow us to stop
> > > printing the error message, fixing the "bug".
> > > 
> > > v2: VLV is an exception to the early_sanitize() rule: it needs to do
> > > stuff before calling early_sanitize(), so instead of calling it
> > > earlier for every platform, we call it earlier for non-VLV by adding
> > > the early_sanitize() call inside intel_resume_prepare(). This doesn't
> > > look like the most-beautiful-solution-ever, but, well, at least it
> > > fixes the bug. (Imre)
> > > 
> > > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > > Cc: Imre Deak <imre.deak at intel.com>
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83094
> > > Signed-off-by: Paulo Zanoni <paulo.r.zanoni at intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_drv.c | 9 ++++++++-
> > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > > index a05a1d0..f6d28f2 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.c
> > > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > > @@ -669,7 +669,6 @@ static int i915_drm_thaw_early(struct drm_device *dev)
> > >  	if (ret)
> > >  		DRM_ERROR("Resume prepare failed: %d,Continuing resume\n", ret);
> > >  
> > > -	intel_uncore_early_sanitize(dev, true);
> > >  	intel_uncore_sanitize(dev);
> > >  	intel_power_domains_init_hw(dev_priv);
> > >  
> > > @@ -1049,6 +1048,8 @@ static int snb_resume_prepare(struct drm_i915_private *dev_priv,
> > >  
> > >  	if (rpm_resume)
> > >  		intel_init_pch_refclk(dev);
> > > +	else
> > > +		intel_uncore_early_sanitize(dev, true);
> > >  
> > >  	return 0;
> > >  }
> > > @@ -1056,6 +1057,9 @@ static int snb_resume_prepare(struct drm_i915_private *dev_priv,
> > >  static int hsw_resume_prepare(struct drm_i915_private *dev_priv,
> > >  				bool rpm_resume)
> > >  {
> > > +	if (!rpm_resume)
> > > +		intel_uncore_early_sanitize(dev_priv->dev, true);
> > > +
> > >  	hsw_disable_pc8(dev_priv);
> > >  
> > >  	return 0;
> > > @@ -1421,6 +1425,9 @@ static int vlv_resume_prepare(struct drm_i915_private *dev_priv,
> > >  		i915_gem_restore_fences(dev);
> > >  	}
> > >  
> > > +	if (!rpm_resume)
> > > +		intel_uncore_early_sanitize(dev, true);
> > > +
> > >  	return ret;
> > >  }
> > >  
> > 
> > You also need to call intel_uncore_early_sanitize() from
> > intel_resume_prepare() for the rest of the platforms. With that fixed:
> > Reviewed-by: Imre Deak <imre.deak at intel.com>
> > 
> > Looking at the result, I agree it's not the nicest, so yet another way
> > to reduce the clutter would be to have the following instead in
> > i915_drm_thaw_early():
> > 
> > intel_resume_early_prepare()
> > intel_uncore_early_sanitize()
> > intel_resume_prepare()
> > 
> > and do the early steps for VLV in intel_resume_early_prepare(). I'm ok
> > with both solutions.
> 
> This honestly starts to smell like a giant maintenance nightmare. We kinda
> started off into the wrong direction with vlv rpm and it seems to get
> worse by the day. And it looks like the situation is messy enough that we
> can't even look down the ordering with copious amounts of warnings ...
> 
> But I also don't see any real solution, so just ranting for now. I'd
> appreciate though if the revised version comes with a bunch of comments
> attached in the code.

I blame it on the HW people. :) Seriously, the VLV PM code differs from
the rest of PM code in that we save/restore some HW state instead of
reinitializing it. That's where the above special casing of the ordering
stems from. I agree that it's not ideal, but I think having started with
that solution and moving towards the ideal was not that bad. In fact
s0ix doesn't yet work in the upstream kernel for reasons independent of
i915 (or at least I couldn't make it work), but we would need it to
fully validate all the suspend/resume paths.

--Imre