[PATCH] drm/i915/pmu: Match frequencies reported by PMU and sysfs
Dixit, Ashutosh
ashutosh.dixit at intel.com
Wed Oct 5 07:40:34 UTC 2022
On Tue, 04 Oct 2022 06:00:22 -0700, Tvrtko Ursulin wrote:
>
Hi Tvrtko,
>
> On 04/10/2022 10:29, Tvrtko Ursulin wrote:
> >
> > On 03/10/2022 20:24, Ashutosh Dixit wrote:
> >> PMU and sysfs use different wakeref's to "interpret" zero freq. Sysfs
> >> uses
> >> runtime PM wakeref (see intel_rps_read_punit_req and
> >> intel_rps_read_actual_frequency). PMU uses the GT parked/unparked
> >> wakeref. In general the GT wakeref is held for less time that the runtime
> >> PM wakeref which causes PMU to report a lower average freq than the
> >> average
> >> freq obtained from sampling sysfs.
> >>
> >> To resolve this, use the same freq functions (and wakeref's) in PMU as
> >> those used in sysfs.
> >>
> >> Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/7025
> >> Reported-by: Ashwin Kumar Kulkarni <ashwin.kumar.kulkarni at intel.com>
> >> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> >> Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com>
> >> ---
> >> drivers/gpu/drm/i915/i915_pmu.c | 27 ++-------------------------
> >> 1 file changed, 2 insertions(+), 25 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_pmu.c
> >> b/drivers/gpu/drm/i915/i915_pmu.c
> >> index 958b37123bf1..eda03f264792 100644
> >> --- a/drivers/gpu/drm/i915/i915_pmu.c
> >> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> >> @@ -371,37 +371,16 @@ static void
> >> frequency_sample(struct intel_gt *gt, unsigned int period_ns)
> >> {
> >> struct drm_i915_private *i915 = gt->i915;
> >> - struct intel_uncore *uncore = gt->uncore;
> >> struct i915_pmu *pmu = &i915->pmu;
> >> struct intel_rps *rps = >->rps;
> >> if (!frequency_sampling_enabled(pmu))
> >> return;
> >> - /* Report 0/0 (actual/requested) frequency while parked. */
> >> - if (!intel_gt_pm_get_if_awake(gt))
> >> - return;
> >> -
> >> if (pmu->enable & config_mask(I915_PMU_ACTUAL_FREQUENCY)) {
> >> - u32 val;
> >> -
> >> - /*
> >> - * We take a quick peek here without using forcewake
> >> - * so that we don't perturb the system under observation
> >> - * (forcewake => !rc6 => increased power use). We expect
> >> - * that if the read fails because it is outside of the
> >> - * mmio power well, then it will return 0 -- in which
> >> - * case we assume the system is running at the intended
> >> - * frequency. Fortunately, the read should rarely fail!
> >> - */
> >> - val = intel_uncore_read_fw(uncore, GEN6_RPSTAT1);
> >> - if (val)
> >> - val = intel_rps_get_cagf(rps, val);
> >> - else
> >> - val = rps->cur_freq;
> >> -
> >> add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_ACT],
> >> - intel_gpu_freq(rps, val), period_ns / 1000);
> >> + intel_rps_read_actual_frequency(rps),
> >> + period_ns / 1000);
> >> }
> >> if (pmu->enable & config_mask(I915_PMU_REQUESTED_FREQUENCY)) {
> >
> > What is software tracking of requested frequency showing when GT is
> > parked or runtime suspended? With this change sampling would be outside
> > any such checks so we need to be sure reported value makes sense.
> >
> > Although more important open is around what is actually correct.
> >
> > For instance how does the patch affect RC6 and power? I don't know how
> > power management of different blocks is wired up, so personally I would
> > only be able to look at it empirically. In other words what I am asking
> > is this - if we changed from skipping obtaining forcewake even when
> > unparked, to obtaining forcewake if not runtime suspended - what hardware
> > blocks does that power up and how it affects RC6 and power? Can it affect
> > actual frequency or not? (Will "something" power up the clocks just
> > because we will be getting forcewake?)
> >
> > Or maybe question simplified - does 200Hz polling on existing sysfs
> > actual frequency field disturbs the system under some circumstances?
> > (Increases power and decreases RC6.) If it does then that would be a
> > problem. We want a solution which shows the real data, but where the act
> > of monitoring itself does not change it too much. If it doesn't then it's
> > okay.
> >
> > Could you somehow investigate on these topics? Maybe log RAPL GPU power
> > while polling on sysfs, versus getting the actual frequency from the
> > existing PMU implementation and see if that shows anything? Or actually
> > simpler - RAPL GPU power for current PMU intel_gpu_top versus this patch?
> > On idle(-ish) desktop workloads perhaps? Power and frequency graphed for
> > both.
>
> Another thought - considering that bspec says for 0xa01c "This register
> reflects real-time values and thus does not have a pre-determined default
> value out of reset" - could it be that it also does not reflect a real
> value when GPU is not executing anything (so zero), just happens to be not
> runtime suspended? That would mean sysfs reads could maybe show last known
> value? Just a thought to check.
Thanks for the suggestion, I'll try to check and report what I find.
> I've also tried on my Alderlake desktop:
>
> 1)
>
> while true; do cat gt_act_freq_mhz >/dev/null; sleep 0.005; done
>
> This costs ~120mW of GPU power and ~20% decrease in RC6.
>
>
> 2)
>
> intel_gpu_top -l -s 5 >/dev/null
>
> This costs no power or RC6.
Thanks for the experiments. As I mentioned for Gen12+ is a different
register which doesn't require taking a forcewake (it's not upstream yet
but you can see it in this patch:
https://patchwork.freedesktop.org/patch/504920/?series=109116&rev=1#comment_910146)
so this issue should not be there at least for Gen12+.
> I have also never observed sysfs to show below min freq. This was with no
> desktop so it's possible this register indeed does not reflect the real
> situation when things are idle.
>
> So I think it is possible sysfs value is the misleading one.
Thanks I will check. The other possibility is if someone is holding a
forcewake, the products where we are seeing this is have GuC controlling
the both the frequency (SLPC) as well RC6 (GUCRC).
Thanks.
--
Ashutosh
More information about the dri-devel
mailing list