[Intel-gfx] [PATCH i-g-t] pm_rps: Extended testcases with checking PMINTRMSK register value

Dec, Katarzyna katarzyna.dec at intel.com
Mon Aug 28 10:34:30 UTC 2017


On Wed, 2017-08-23 at 17:04 +0200, Daniel Vetter wrote:
> On Tue, Aug 22, 2017 at 01:14:19PM +0000, Szwichtenberg, Radoslaw
> wrote:
> > On Tue, 2017-08-22 at 13:33 +0100, Chris Wilson wrote:
> > > Quoting Szwichtenberg, Radoslaw (2017-08-22 12:56:00)
> > > > On Tue, 2017-08-22 at 01:31 +0300, Arkadiusz Hiler wrote:
> > > > > On Mon, Aug 21, 2017 at 09:39:24PM +0200, Daniel Vetter
> > > > > wrote:
> > > > > > On Mon, Aug 21, 2017 at 11:21:49AM +0100, Chris Wilson
> > > > > > wrote:
> > > > > > > Quoting Chris Wilson (2017-08-21 10:53:36)
> > > > > > > > Quoting Arkadiusz Hiler (2017-08-21 10:42:25)
> > > > > > > > > On Mon, Aug 21, 2017 at 08:05:58AM +0000, Dec,
> > > > > > > > > Katarzyna wrote:
> > > > > > > > > > I understand we do not want to check registers in
> > > > > > > > > > IGT tests.
> > > > > > > > > > What
> > > > > > > > > > about reading interrupt masks from debugfs
> > > > > > > > > > (i915_frequency_info).
> > > > > > > > > 
> > > > > > > > > Hey Kasia
> > > > > > > > > 
> > > > > > > > > It would be pretty much the same thing, but instead
> > > > > > > > > of us reading
> > > > > > > > > the
> > > > > > > > > PMINTRMASK directly we would ask the kernel to do
> > > > > > > > > that on our
> > > > > > > > > behalf.
> > > > > > > > > 
> > > > > > > > > That would just hide register read, not get rid of
> > > > > > > > > it.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > I think you are missing the point. The idea is that
> > > > > > > > > we do not want
> > > > > > > > > to
> > > > > > > > > test details of in-kernel implementation, not ban the
> > > > > > > > > register
> > > > > > > > > reads
> > > > > > > > > completely.
> > > > > > > > > 
> > > > > > > > > Reading register directly, especially just to make
> > > > > > > > > sure that the
> > > > > > > > > kernel
> > > > > > > > > set something correctly is a good indicator that we
> > > > > > > > > are trying to
> > > > > > > > > do
> > > > > > > > > just that - test the internal details.
> > > > > > > > > 
> > > > > > > > > > Would that be better approach? You guys suggested
> > > > > > > > > > to get
> > > > > > > > > > interested
> > > > > > > > > > in
> > > > > > > > > > kselftests for having such checks, but I am afraid
> > > > > > > > > > that it could
> > > > > > > > > > be
> > > > > > > > > > too much job and we have too few hands to work.
> > > > > > > > > 
> > > > > > > > > How much of an effort would the kselftest be, since
> > > > > > > > > it seems that
> > > > > > > > > you
> > > > > > > > > did some
> > > > > > > > > investigation already?
> > > > > > > > 
> > > > > > > > It doesn't even require a whole selftest, just
> > > > > > > > something like
> > > > > > > > 
> > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > > b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > > index 448e71af4772..e83b67fe0354 100644
> > > > > > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > > @@ -7733,7 +7733,8 @@ void
> > > > > > > > intel_suspend_gt_powersave(struct
> > > > > > > > drm_i915_private *dev_priv)
> > > > > > > >         if (cancel_delayed_work_sync(&dev_priv-
> > > > > > > > > rps.autoenable_work))
> > > > > > > > 
> > > > > > > >                 intel_runtime_pm_put(dev_priv);
> > > > > > > >  
> > > > > > > > -       /* gen6_rps_idle() will be called later to
> > > > > > > > disable
> > > > > > > > interrupts */
> > > > > > > > +       WARN_ON(I915_READ(GEN6_PMINTRMSK) !=
> > > > > > > > +               gen6_sanitize_rps_pm_mask(dev_priv,
> > > > > > > > ~0));
> > > > > > > >  }
> > > > > > > 
> > > > > > > Wrong spot. We actually need a call from
> > > > > > > intel_runtime_pm_disable_interrupts.
> > > > > > 
> > > > > > Yeah, for consistency checks which are very closely tied to
> > > > > > the
> > > > > > implementation we tend to sprinkle WARN_ON all over the
> > > > > > place. In some
> > > > > > cases those checks are too expensive for production, then
> > > > > > we add a
> > > > > > compile-time-option to hide them (e.g. GEM_BUG_ON).
> > > > > > 
> > > > > > I chatted with Radek, and if I understand things correctly,
> > > > > > the main
> > > > > > value
> > > > > > you derive from these is making sure a frankenstein port to
> > > > > > an older
> > > > > > kernel doesn't miss or miss-apply any critical patches. In-
> > > > > > kernel
> > > > > > consistency checks unfortunately don't really help with
> > > > > > that, but we
> > > > > > heavily rely on these for validation.
> > > > > 
> > > > > Having that stated on the mailing list from the beginning
> > > > > (e.g. in the
> > > > > commit message or as one of the first replies) would help
> > > > > directing the
> > > > > whole discussion on the right track and make us understand
> > > > > your needs
> > > > > better.
> > > > > 
> > > > > I agree with Daniel's earlier statement that we should be
> > > > > very
> > > > > (over)verbose about the changes we are making and purpose
> > > > > they are
> > > > > serving.
> > > > > 
> > > > > > There's also examples of this (and again, they're very
> > > > > > important)
> > > > > > outside
> > > > > > of i915, like kasan, lockdep (and maybe we'll even get
> > > > > > kmemleak somehow
> > > > > > integrated into CI/igt eventually).
> > > > > > 
> > > > > > So still no idea what would be the best suggestion here for
> > > > > > your team.
> > > > > 
> > > > > Kasia and Radek, can you elaborate a little more on the
> > > > > "frankenstein
> > > > > port" and your use cases for such tests?
> > > > > 
> > > > > How is that comparable to backports to stable/LTS kernel
> > > > > branches?
> > > > > 
> > > > 
> > > > This test proposed by Kasia not only was used to find various
> > > > regressions
> > > > (including performance ones) that were later fixed on upstream
> > > > (and example
> > > > would be patch from Sagar: https://patchwork.kernel.org/patch/9
> > > > 527335/).
> > > 
> > > It is a terrible test for that. If you're goal is to validate
> > > performance,
> > > do that. For example, the presumption there is that RPS continues
> > > to
> > > respond after a sequence of events (delivering the expected
> > > performance). You can either use the gpu clocks as reported by
> > > the
> > > kernel, but you forgo that trust by doing a known workload and
> > > count cycles. The result should be that you get a test that
> > > matches
> > > typical patterns and corner cases of userspace, and that you are
> > > delivering the expected *userspace* behaviour. All in a way that
> > > makes
> > > no assumptions how the kernel/hw responds, just that it will.
> > > -Chris
> > 
> > I do agree with your point - I just wanted give a example of where
> > this test
> > turned out helpfull (not saying this test was aimed to catch such
> > behaviour even
> > though it was helpfull in rootcausing the provlem). I hope Sagar
> > will be able to
> > provide better examples.
> > 
> > The most important part of this discussion probably is a question
> > whether IGT
> > should be backwards compatible and help testing previous stable
> > kernel releases.
> > I know that saying that some user of IGTs uses it in certain ways
> > to test older
> > kernels (we do) is not enough to justify upstreaming these. We do
> > maintain some
> > older kernels (stable ones) with some changes forklifted from
> > upstream/newer
> > releases and this test does help finding out if all required
> > changes were
> > cherrypicked correctly or not.
> 
> One thing we could do on that front is run igt patches against older-
> ish
> kernels. Well, since full igt runs are already at the limit of
> machine
> time we have, that would probably be something that each release
> engineering team that cares about a specific kernel/hw combo would
> need to
> do.
> 
> But patchwork supports that, you can add arbitrary amounts of CI
> farms to
> it that post results.
> 
> If we only rely on review at least I don't think igt is going to be
> backward compatible enough for serious backporting efforts. There's
> still
> going to be issues, simply because we sometimes need to change the
> uabi
> (in a way that only breaks igt, not any real apps ofc).
> -Daniel
How about adding check to kernel and having such test in IGT? We could
have double check in case of "frankenstein" kernel builds. This is
really the special case when we want to check registers value. I think
there is no other way to check such things using IGT.

About check in kernel code:
Is this something like that you have in mind?
diff --git a/drivers/gpu/drm/i915/i915_irq.c
b/drivers/gpu/drm/i915/i915_irq.c
index 196caa463edf..b4030d8fa4c9 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -4326,6 +4326,7 @@ void intel_runtime_pm_disable_interrupts(struct
drm_i915_private *dev_priv)
        dev_priv->drm.driver->irq_uninstall(&dev_priv->drm);
        dev_priv->pm.irqs_enabled = false;
        synchronize_irq(dev_priv->drm.irq);
+       WARN_ON(I915_READ(GEN6_PMINTRMSK)!=gen6_sanitize_rps_pm_mask(de
v_priv, ~0));
 }

Regards,
Kasia


More information about the Intel-gfx mailing list