[Intel-gfx] [PATCH] drm/i915: run intel_uncore_early_sanitize earlier on resume on non-VLV

Wed Oct 22 21:01:54 CEST 2014

2014-10-22 9:20 GMT-02:00 Imre Deak <imre.deak at intel.com>:
> On Tue, 2014-10-21 at 19:05 +0200, Daniel Vetter wrote:
>> On Mon, Oct 20, 2014 at 01:20:50PM +0300, Imre Deak wrote:
>> > On Fri, 2014-10-17 at 16:01 -0300, Paulo Zanoni wrote:
>> > > From: Paulo Zanoni <paulo.r.zanoni at intel.com>
>> > >
>> > > As far as I understand, intel_uncore_early_sanitize() was supposed to
>> > > be ran before any register access, but currently
>> > > intel_resume_prepare() is ran earlier, and it does register
>> > > access. I don't think it should be safe to be calling
>> > > I915_{READ,WRITE} without calling intel_uncore_early_sanitize() first.
>> > >
>> > > One of the problems we currently have is that when we suspend/resume
>> > > BDW, the FPGA_DBG_RM_NOCLAIM bit becomes 1, so we end up printing an
>> > > "unclaimed register" message on resume, but this message doesn't
>> > > really seem to have been triggered by our driver or user space, since
>> > > the bit was not there before suspending, and gets there just after
>> > > resuming, before any of our own register accesses. So calling
>> > > intel_uncore_early_sanitize() as a first thing will allow us to stop
>> > > printing the error message, fixing the "bug".
>> > >
>> > > v2: VLV is an exception to the early_sanitize() rule: it needs to do
>> > > stuff before calling early_sanitize(), so instead of calling it
>> > > earlier for every platform, we call it earlier for non-VLV by adding
>> > > the early_sanitize() call inside intel_resume_prepare(). This doesn't
>> > > look like the most-beautiful-solution-ever, but, well, at least it
>> > > fixes the bug. (Imre)
>> > >
>> > > Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> > > Cc: Imre Deak <imre.deak at intel.com>
>> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83094
>> > > Signed-off-by: Paulo Zanoni <paulo.r.zanoni at intel.com>
>> > > ---
>> > >  drivers/gpu/drm/i915/i915_drv.c | 9 ++++++++-
>> > >  1 file changed, 8 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>> > > index a05a1d0..f6d28f2 100644
>> > > --- a/drivers/gpu/drm/i915/i915_drv.c
>> > > +++ b/drivers/gpu/drm/i915/i915_drv.c
>> > > @@ -669,7 +669,6 @@ static int i915_drm_thaw_early(struct drm_device *dev)
>> > >   if (ret)
>> > >           DRM_ERROR("Resume prepare failed: %d,Continuing resume\n", ret);
>> > >
>> > > - intel_uncore_early_sanitize(dev, true);
>> > >   intel_uncore_sanitize(dev);
>> > >   intel_power_domains_init_hw(dev_priv);
>> > >
>> > > @@ -1049,6 +1048,8 @@ static int snb_resume_prepare(struct drm_i915_private *dev_priv,
>> > >
>> > >   if (rpm_resume)
>> > >           intel_init_pch_refclk(dev);
>> > > + else
>> > > +         intel_uncore_early_sanitize(dev, true);
>> > >
>> > >   return 0;
>> > >  }
>> > > @@ -1056,6 +1057,9 @@ static int snb_resume_prepare(struct drm_i915_private *dev_priv,
>> > >  static int hsw_resume_prepare(struct drm_i915_private *dev_priv,
>> > >                           bool rpm_resume)
>> > >  {
>> > > + if (!rpm_resume)
>> > > +         intel_uncore_early_sanitize(dev_priv->dev, true);
>> > > +
>> > >   hsw_disable_pc8(dev_priv);
>> > >
>> > >   return 0;
>> > > @@ -1421,6 +1425,9 @@ static int vlv_resume_prepare(struct drm_i915_private *dev_priv,
>> > >           i915_gem_restore_fences(dev);
>> > >   }
>> > >
>> > > + if (!rpm_resume)
>> > > +         intel_uncore_early_sanitize(dev, true);
>> > > +
>> > >   return ret;
>> > >  }
>> > >
>> >
>> > You also need to call intel_uncore_early_sanitize() from
>> > intel_resume_prepare() for the rest of the platforms. With that fixed:
>> > Reviewed-by: Imre Deak <imre.deak at intel.com>
>> >
>> > Looking at the result, I agree it's not the nicest, so yet another way
>> > to reduce the clutter would be to have the following instead in
>> > i915_drm_thaw_early():
>> >
>> > intel_resume_early_prepare()
>> > intel_uncore_early_sanitize()
>> > intel_resume_prepare()
>> >
>> > and do the early steps for VLV in intel_resume_early_prepare(). I'm ok
>> > with both solutions.
>>
>> This honestly starts to smell like a giant maintenance nightmare. We kinda
>> started off into the wrong direction with vlv rpm and it seems to get
>> worse by the day. And it looks like the situation is messy enough that we
>> can't even look down the ordering with copious amounts of warnings ...
>>
>> But I also don't see any real solution, so just ranting for now. I'd
>> appreciate though if the revised version comes with a bunch of comments
>> attached in the code.
>
> I blame it on the HW people. :) Seriously, the VLV PM code differs from
> the rest of PM code in that we save/restore some HW state instead of
> reinitializing it. That's where the above special casing of the ordering
> stems from. I agree that it's not ideal, but I think having started with
> that solution and moving towards the ideal was not that bad. In fact
> s0ix doesn't yet work in the upstream kernel for reasons independent of
> i915 (or at least I couldn't make it work), but we would need it to
> fully validate all the suspend/resume paths.

On a side note, even igt/pm_rpm/rte (the basic subtest) seems to be
broken on BYT since forever (at least according to QA, bug #82939), so
do we even want RPM enabled on BYT?

>
> --Imre
>

-- 
Paulo Zanoni