[Intel-gfx] [PATCH i-g-t] pm_rps: Extended testcases with checking PMINTRMSK register value

Wed Aug 23 15:04:08 UTC 2017

On Tue, Aug 22, 2017 at 01:14:19PM +0000, Szwichtenberg, Radoslaw wrote:
> On Tue, 2017-08-22 at 13:33 +0100, Chris Wilson wrote:
> > Quoting Szwichtenberg, Radoslaw (2017-08-22 12:56:00)
> > > On Tue, 2017-08-22 at 01:31 +0300, Arkadiusz Hiler wrote:
> > > > On Mon, Aug 21, 2017 at 09:39:24PM +0200, Daniel Vetter wrote:
> > > > > On Mon, Aug 21, 2017 at 11:21:49AM +0100, Chris Wilson wrote:
> > > > > > Quoting Chris Wilson (2017-08-21 10:53:36)
> > > > > > > Quoting Arkadiusz Hiler (2017-08-21 10:42:25)
> > > > > > > > On Mon, Aug 21, 2017 at 08:05:58AM +0000, Dec, Katarzyna wrote:
> > > > > > > > > I understand we do not want to check registers in IGT tests.
> > > > > > > > > What
> > > > > > > > > about reading interrupt masks from debugfs
> > > > > > > > > (i915_frequency_info).
> > > > > > > > 
> > > > > > > > Hey Kasia
> > > > > > > > 
> > > > > > > > It would be pretty much the same thing, but instead of us reading
> > > > > > > > the
> > > > > > > > PMINTRMASK directly we would ask the kernel to do that on our
> > > > > > > > behalf.
> > > > > > > > 
> > > > > > > > That would just hide register read, not get rid of it.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > I think you are missing the point. The idea is that we do not want
> > > > > > > > to
> > > > > > > > test details of in-kernel implementation, not ban the register
> > > > > > > > reads
> > > > > > > > completely.
> > > > > > > > 
> > > > > > > > Reading register directly, especially just to make sure that the
> > > > > > > > kernel
> > > > > > > > set something correctly is a good indicator that we are trying to
> > > > > > > > do
> > > > > > > > just that - test the internal details.
> > > > > > > > 
> > > > > > > > > Would that be better approach? You guys suggested to get
> > > > > > > > > interested
> > > > > > > > > in
> > > > > > > > > kselftests for having such checks, but I am afraid that it could
> > > > > > > > > be
> > > > > > > > > too much job and we have too few hands to work.
> > > > > > > > 
> > > > > > > > How much of an effort would the kselftest be, since it seems that
> > > > > > > > you
> > > > > > > > did some
> > > > > > > > investigation already?
> > > > > > > 
> > > > > > > It doesn't even require a whole selftest, just something like
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > index 448e71af4772..e83b67fe0354 100644
> > > > > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > @@ -7733,7 +7733,8 @@ void intel_suspend_gt_powersave(struct
> > > > > > > drm_i915_private *dev_priv)
> > > > > > >         if (cancel_delayed_work_sync(&dev_priv-
> > > > > > > >rps.autoenable_work))
> > > > > > >                 intel_runtime_pm_put(dev_priv);
> > > > > > >  
> > > > > > > -       /* gen6_rps_idle() will be called later to disable
> > > > > > > interrupts */
> > > > > > > +       WARN_ON(I915_READ(GEN6_PMINTRMSK) !=
> > > > > > > +               gen6_sanitize_rps_pm_mask(dev_priv, ~0));
> > > > > > >  }
> > > > > > 
> > > > > > Wrong spot. We actually need a call from
> > > > > > intel_runtime_pm_disable_interrupts.
> > > > > 
> > > > > Yeah, for consistency checks which are very closely tied to the
> > > > > implementation we tend to sprinkle WARN_ON all over the place. In some
> > > > > cases those checks are too expensive for production, then we add a
> > > > > compile-time-option to hide them (e.g. GEM_BUG_ON).
> > > > > 
> > > > > I chatted with Radek, and if I understand things correctly, the main
> > > > > value
> > > > > you derive from these is making sure a frankenstein port to an older
> > > > > kernel doesn't miss or miss-apply any critical patches. In-kernel
> > > > > consistency checks unfortunately don't really help with that, but we
> > > > > heavily rely on these for validation.
> > > > 
> > > > Having that stated on the mailing list from the beginning (e.g. in the
> > > > commit message or as one of the first replies) would help directing the
> > > > whole discussion on the right track and make us understand your needs
> > > > better.
> > > > 
> > > > I agree with Daniel's earlier statement that we should be very
> > > > (over)verbose about the changes we are making and purpose they are
> > > > serving.
> > > > 
> > > > > There's also examples of this (and again, they're very important)
> > > > > outside
> > > > > of i915, like kasan, lockdep (and maybe we'll even get kmemleak somehow
> > > > > integrated into CI/igt eventually).
> > > > > 
> > > > > So still no idea what would be the best suggestion here for your team.
> > > > 
> > > > Kasia and Radek, can you elaborate a little more on the "frankenstein
> > > > port" and your use cases for such tests?
> > > > 
> > > > How is that comparable to backports to stable/LTS kernel branches?
> > > > 
> > > 
> > > This test proposed by Kasia not only was used to find various regressions
> > > (including performance ones) that were later fixed on upstream (and example
> > > would be patch from Sagar: https://patchwork.kernel.org/patch/9527335/).
> > 
> > It is a terrible test for that. If you're goal is to validate performance,
> > do that. For example, the presumption there is that RPS continues to
> > respond after a sequence of events (delivering the expected
> > performance). You can either use the gpu clocks as reported by the
> > kernel, but you forgo that trust by doing a known workload and
> > count cycles. The result should be that you get a test that matches
> > typical patterns and corner cases of userspace, and that you are
> > delivering the expected *userspace* behaviour. All in a way that makes
> > no assumptions how the kernel/hw responds, just that it will.
> > -Chris
> 
> I do agree with your point - I just wanted give a example of where this test
> turned out helpfull (not saying this test was aimed to catch such behaviour even
> though it was helpfull in rootcausing the provlem). I hope Sagar will be able to
> provide better examples.
> 
> The most important part of this discussion probably is a question whether IGT
> should be backwards compatible and help testing previous stable kernel releases.
> I know that saying that some user of IGTs uses it in certain ways to test older
> kernels (we do) is not enough to justify upstreaming these. We do maintain some
> older kernels (stable ones) with some changes forklifted from upstream/newer
> releases and this test does help finding out if all required changes were
> cherrypicked correctly or not.

One thing we could do on that front is run igt patches against older-ish
kernels. Well, since full igt runs are already at the limit of machine
time we have, that would probably be something that each release
engineering team that cares about a specific kernel/hw combo would need to
do.

But patchwork supports that, you can add arbitrary amounts of CI farms to
it that post results.

If we only rely on review at least I don't think igt is going to be
backward compatible enough for serious backporting efforts. There's still
going to be issues, simply because we sometimes need to change the uabi
(in a way that only breaks igt, not any real apps ofc).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch