[Intel-gfx] [PATCH i-g-t] pm_rps: Extended testcases with checking PMINTRMSK register value

Szwichtenberg, Radoslaw radoslaw.szwichtenberg at intel.com
Tue Aug 22 13:14:19 UTC 2017


On Tue, 2017-08-22 at 13:33 +0100, Chris Wilson wrote:
> Quoting Szwichtenberg, Radoslaw (2017-08-22 12:56:00)
> > On Tue, 2017-08-22 at 01:31 +0300, Arkadiusz Hiler wrote:
> > > On Mon, Aug 21, 2017 at 09:39:24PM +0200, Daniel Vetter wrote:
> > > > On Mon, Aug 21, 2017 at 11:21:49AM +0100, Chris Wilson wrote:
> > > > > Quoting Chris Wilson (2017-08-21 10:53:36)
> > > > > > Quoting Arkadiusz Hiler (2017-08-21 10:42:25)
> > > > > > > On Mon, Aug 21, 2017 at 08:05:58AM +0000, Dec, Katarzyna wrote:
> > > > > > > > I understand we do not want to check registers in IGT tests.
> > > > > > > > What
> > > > > > > > about reading interrupt masks from debugfs
> > > > > > > > (i915_frequency_info).
> > > > > > > 
> > > > > > > Hey Kasia
> > > > > > > 
> > > > > > > It would be pretty much the same thing, but instead of us reading
> > > > > > > the
> > > > > > > PMINTRMASK directly we would ask the kernel to do that on our
> > > > > > > behalf.
> > > > > > > 
> > > > > > > That would just hide register read, not get rid of it.
> > > > > > > 
> > > > > > > 
> > > > > > > I think you are missing the point. The idea is that we do not want
> > > > > > > to
> > > > > > > test details of in-kernel implementation, not ban the register
> > > > > > > reads
> > > > > > > completely.
> > > > > > > 
> > > > > > > Reading register directly, especially just to make sure that the
> > > > > > > kernel
> > > > > > > set something correctly is a good indicator that we are trying to
> > > > > > > do
> > > > > > > just that - test the internal details.
> > > > > > > 
> > > > > > > > Would that be better approach? You guys suggested to get
> > > > > > > > interested
> > > > > > > > in
> > > > > > > > kselftests for having such checks, but I am afraid that it could
> > > > > > > > be
> > > > > > > > too much job and we have too few hands to work.
> > > > > > > 
> > > > > > > How much of an effort would the kselftest be, since it seems that
> > > > > > > you
> > > > > > > did some
> > > > > > > investigation already?
> > > > > > 
> > > > > > It doesn't even require a whole selftest, just something like
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c
> > > > > > b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > index 448e71af4772..e83b67fe0354 100644
> > > > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > @@ -7733,7 +7733,8 @@ void intel_suspend_gt_powersave(struct
> > > > > > drm_i915_private *dev_priv)
> > > > > >         if (cancel_delayed_work_sync(&dev_priv-
> > > > > > >rps.autoenable_work))
> > > > > >                 intel_runtime_pm_put(dev_priv);
> > > > > >  
> > > > > > -       /* gen6_rps_idle() will be called later to disable
> > > > > > interrupts */
> > > > > > +       WARN_ON(I915_READ(GEN6_PMINTRMSK) !=
> > > > > > +               gen6_sanitize_rps_pm_mask(dev_priv, ~0));
> > > > > >  }
> > > > > 
> > > > > Wrong spot. We actually need a call from
> > > > > intel_runtime_pm_disable_interrupts.
> > > > 
> > > > Yeah, for consistency checks which are very closely tied to the
> > > > implementation we tend to sprinkle WARN_ON all over the place. In some
> > > > cases those checks are too expensive for production, then we add a
> > > > compile-time-option to hide them (e.g. GEM_BUG_ON).
> > > > 
> > > > I chatted with Radek, and if I understand things correctly, the main
> > > > value
> > > > you derive from these is making sure a frankenstein port to an older
> > > > kernel doesn't miss or miss-apply any critical patches. In-kernel
> > > > consistency checks unfortunately don't really help with that, but we
> > > > heavily rely on these for validation.
> > > 
> > > Having that stated on the mailing list from the beginning (e.g. in the
> > > commit message or as one of the first replies) would help directing the
> > > whole discussion on the right track and make us understand your needs
> > > better.
> > > 
> > > I agree with Daniel's earlier statement that we should be very
> > > (over)verbose about the changes we are making and purpose they are
> > > serving.
> > > 
> > > > There's also examples of this (and again, they're very important)
> > > > outside
> > > > of i915, like kasan, lockdep (and maybe we'll even get kmemleak somehow
> > > > integrated into CI/igt eventually).
> > > > 
> > > > So still no idea what would be the best suggestion here for your team.
> > > 
> > > Kasia and Radek, can you elaborate a little more on the "frankenstein
> > > port" and your use cases for such tests?
> > > 
> > > How is that comparable to backports to stable/LTS kernel branches?
> > > 
> > 
> > This test proposed by Kasia not only was used to find various regressions
> > (including performance ones) that were later fixed on upstream (and example
> > would be patch from Sagar: https://patchwork.kernel.org/patch/9527335/).
> 
> It is a terrible test for that. If you're goal is to validate performance,
> do that. For example, the presumption there is that RPS continues to
> respond after a sequence of events (delivering the expected
> performance). You can either use the gpu clocks as reported by the
> kernel, but you forgo that trust by doing a known workload and
> count cycles. The result should be that you get a test that matches
> typical patterns and corner cases of userspace, and that you are
> delivering the expected *userspace* behaviour. All in a way that makes
> no assumptions how the kernel/hw responds, just that it will.
> -Chris

I do agree with your point - I just wanted give a example of where this test
turned out helpfull (not saying this test was aimed to catch such behaviour even
though it was helpfull in rootcausing the provlem). I hope Sagar will be able to
provide better examples.

The most important part of this discussion probably is a question whether IGT
should be backwards compatible and help testing previous stable kernel releases.
I know that saying that some user of IGTs uses it in certain ways to test older
kernels (we do) is not enough to justify upstreaming these. We do maintain some
older kernels (stable ones) with some changes forklifted from upstream/newer
releases and this test does help finding out if all required changes were
cherrypicked correctly or not.

Radek


More information about the Intel-gfx mailing list