[Intel-gfx] [PATCH i-g-t] pm_rps: Extended testcases with checking PMINTRMSK register value

Tue Aug 22 12:33:39 UTC 2017

Quoting Szwichtenberg, Radoslaw (2017-08-22 12:56:00)
> On Tue, 2017-08-22 at 01:31 +0300, Arkadiusz Hiler wrote:
> > On Mon, Aug 21, 2017 at 09:39:24PM +0200, Daniel Vetter wrote:
> > > On Mon, Aug 21, 2017 at 11:21:49AM +0100, Chris Wilson wrote:
> > > > Quoting Chris Wilson (2017-08-21 10:53:36)
> > > > > Quoting Arkadiusz Hiler (2017-08-21 10:42:25)
> > > > > > On Mon, Aug 21, 2017 at 08:05:58AM +0000, Dec, Katarzyna wrote:
> > > > > > > I understand we do not want to check registers in IGT tests. What
> > > > > > > about reading interrupt masks from debugfs (i915_frequency_info).
> > > > > > 
> > > > > > Hey Kasia
> > > > > > 
> > > > > > It would be pretty much the same thing, but instead of us reading the
> > > > > > PMINTRMASK directly we would ask the kernel to do that on our behalf.
> > > > > > 
> > > > > > That would just hide register read, not get rid of it.
> > > > > > 
> > > > > > 
> > > > > > I think you are missing the point. The idea is that we do not want to
> > > > > > test details of in-kernel implementation, not ban the register reads
> > > > > > completely.
> > > > > > 
> > > > > > Reading register directly, especially just to make sure that the
> > > > > > kernel
> > > > > > set something correctly is a good indicator that we are trying to do
> > > > > > just that - test the internal details.
> > > > > > 
> > > > > > > Would that be better approach? You guys suggested to get interested
> > > > > > > in
> > > > > > > kselftests for having such checks, but I am afraid that it could be
> > > > > > > too much job and we have too few hands to work.
> > > > > > 
> > > > > > How much of an effort would the kselftest be, since it seems that you
> > > > > > did some
> > > > > > investigation already?
> > > > > 
> > > > > It doesn't even require a whole selftest, just something like
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c
> > > > > b/drivers/gpu/drm/i915/intel_pm.c
> > > > > index 448e71af4772..e83b67fe0354 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > > @@ -7733,7 +7733,8 @@ void intel_suspend_gt_powersave(struct
> > > > > drm_i915_private *dev_priv)
> > > > >         if (cancel_delayed_work_sync(&dev_priv->rps.autoenable_work))
> > > > >                 intel_runtime_pm_put(dev_priv);
> > > > >  
> > > > > -       /* gen6_rps_idle() will be called later to disable interrupts */
> > > > > +       WARN_ON(I915_READ(GEN6_PMINTRMSK) !=
> > > > > +               gen6_sanitize_rps_pm_mask(dev_priv, ~0));
> > > > >  }
> > > > 
> > > > Wrong spot. We actually need a call from
> > > > intel_runtime_pm_disable_interrupts.
> > > 
> > > Yeah, for consistency checks which are very closely tied to the
> > > implementation we tend to sprinkle WARN_ON all over the place. In some
> > > cases those checks are too expensive for production, then we add a
> > > compile-time-option to hide them (e.g. GEM_BUG_ON).
> > > 
> > > I chatted with Radek, and if I understand things correctly, the main value
> > > you derive from these is making sure a frankenstein port to an older
> > > kernel doesn't miss or miss-apply any critical patches. In-kernel
> > > consistency checks unfortunately don't really help with that, but we
> > > heavily rely on these for validation.
> > 
> > Having that stated on the mailing list from the beginning (e.g. in the
> > commit message or as one of the first replies) would help directing the
> > whole discussion on the right track and make us understand your needs
> > better.
> > 
> > I agree with Daniel's earlier statement that we should be very
> > (over)verbose about the changes we are making and purpose they are
> > serving.
> > 
> > > There's also examples of this (and again, they're very important) outside
> > > of i915, like kasan, lockdep (and maybe we'll even get kmemleak somehow
> > > integrated into CI/igt eventually).
> > > 
> > > So still no idea what would be the best suggestion here for your team.
> > 
> > Kasia and Radek, can you elaborate a little more on the "frankenstein
> > port" and your use cases for such tests?
> > 
> > How is that comparable to backports to stable/LTS kernel branches?
> > 
> This test proposed by Kasia not only was used to find various regressions
> (including performance ones) that were later fixed on upstream (and example
> would be patch from Sagar: https://patchwork.kernel.org/patch/9527335/).

It is a terrible test for that. If you're goal is to validate performance,
do that. For example, the presumption there is that RPS continues to
respond after a sequence of events (delivering the expected
performance). You can either use the gpu clocks as reported by the
kernel, but you forgo that trust by doing a known workload and
count cycles. The result should be that you get a test that matches
typical patterns and corner cases of userspace, and that you are
delivering the expected *userspace* behaviour. All in a way that makes
no assumptions how the kernel/hw responds, just that it will.
-Chris