[Intel-gfx] [PATCH i-g-t] pm_rps: Extended testcases with checking PMINTRMSK register value
Szwichtenberg, Radoslaw
radoslaw.szwichtenberg at intel.com
Tue Aug 22 11:56:00 UTC 2017
On Tue, 2017-08-22 at 01:31 +0300, Arkadiusz Hiler wrote:
> On Mon, Aug 21, 2017 at 09:39:24PM +0200, Daniel Vetter wrote:
> > On Mon, Aug 21, 2017 at 11:21:49AM +0100, Chris Wilson wrote:
> > > Quoting Chris Wilson (2017-08-21 10:53:36)
> > > > Quoting Arkadiusz Hiler (2017-08-21 10:42:25)
> > > > > On Mon, Aug 21, 2017 at 08:05:58AM +0000, Dec, Katarzyna wrote:
> > > > > > I understand we do not want to check registers in IGT tests. What
> > > > > > about reading interrupt masks from debugfs (i915_frequency_info).
> > > > >
> > > > > Hey Kasia
> > > > >
> > > > > It would be pretty much the same thing, but instead of us reading the
> > > > > PMINTRMASK directly we would ask the kernel to do that on our behalf.
> > > > >
> > > > > That would just hide register read, not get rid of it.
> > > > >
> > > > >
> > > > > I think you are missing the point. The idea is that we do not want to
> > > > > test details of in-kernel implementation, not ban the register reads
> > > > > completely.
> > > > >
> > > > > Reading register directly, especially just to make sure that the
> > > > > kernel
> > > > > set something correctly is a good indicator that we are trying to do
> > > > > just that - test the internal details.
> > > > >
> > > > > > Would that be better approach? You guys suggested to get interested
> > > > > > in
> > > > > > kselftests for having such checks, but I am afraid that it could be
> > > > > > too much job and we have too few hands to work.
> > > > >
> > > > > How much of an effort would the kselftest be, since it seems that you
> > > > > did some
> > > > > investigation already?
> > > >
> > > > It doesn't even require a whole selftest, just something like
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c
> > > > b/drivers/gpu/drm/i915/intel_pm.c
> > > > index 448e71af4772..e83b67fe0354 100644
> > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > @@ -7733,7 +7733,8 @@ void intel_suspend_gt_powersave(struct
> > > > drm_i915_private *dev_priv)
> > > > if (cancel_delayed_work_sync(&dev_priv->rps.autoenable_work))
> > > > intel_runtime_pm_put(dev_priv);
> > > >
> > > > - /* gen6_rps_idle() will be called later to disable interrupts */
> > > > + WARN_ON(I915_READ(GEN6_PMINTRMSK) !=
> > > > + gen6_sanitize_rps_pm_mask(dev_priv, ~0));
> > > > }
> > >
> > > Wrong spot. We actually need a call from
> > > intel_runtime_pm_disable_interrupts.
> >
> > Yeah, for consistency checks which are very closely tied to the
> > implementation we tend to sprinkle WARN_ON all over the place. In some
> > cases those checks are too expensive for production, then we add a
> > compile-time-option to hide them (e.g. GEM_BUG_ON).
> >
> > I chatted with Radek, and if I understand things correctly, the main value
> > you derive from these is making sure a frankenstein port to an older
> > kernel doesn't miss or miss-apply any critical patches. In-kernel
> > consistency checks unfortunately don't really help with that, but we
> > heavily rely on these for validation.
>
> Having that stated on the mailing list from the beginning (e.g. in the
> commit message or as one of the first replies) would help directing the
> whole discussion on the right track and make us understand your needs
> better.
>
> I agree with Daniel's earlier statement that we should be very
> (over)verbose about the changes we are making and purpose they are
> serving.
>
> > There's also examples of this (and again, they're very important) outside
> > of i915, like kasan, lockdep (and maybe we'll even get kmemleak somehow
> > integrated into CI/igt eventually).
> >
> > So still no idea what would be the best suggestion here for your team.
>
> Kasia and Radek, can you elaborate a little more on the "frankenstein
> port" and your use cases for such tests?
>
> How is that comparable to backports to stable/LTS kernel branches?
>
This test proposed by Kasia not only was used to find various regressions
(including performance ones) that were later fixed on upstream (and example
would be patch from Sagar: https://patchwork.kernel.org/patch/9527335/).
Additionally we do sometimes use older releases of kernel for which we do
backport some of the latest changes (on need basis). As this test verifies
whether masking is done as we expect it to be done it allows us to ensure that
during forklift/cherrypicking or any other process any of the required patches
were not missed.
I believe that proposed changes are not introducing any overhead or information
that is not really usefull for developers/test runners. Also providing support
for all previous and future platforms should not be an issue in this case.
Information about masking proved multiple times to be usefull when looking for
root cause of issues with rps we were facing. Fixes for all these issues (if
were still applicable to upstream code) were sent upstream (I think mostly by
Sagar). I think that there is no other way to verify whether rps interrupts are
masked or not in specific situations and using a register value is the only way
to do it.
CC'ed Sagar - maybe you would like to add something to this discussion?
Cheers,
Radek
> Hopefully we can work out how to approach similar cases, because it
> looks like it's might be a recurring theme in the future
>
More information about the Intel-gfx
mailing list