[PATCH] drm/i915/pmu: Match frequencies reported by PMU and sysfs
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Tue Oct 4 13:05:55 UTC 2022
On 04/10/2022 14:00, Tvrtko Ursulin wrote:
>
> On 04/10/2022 10:29, Tvrtko Ursulin wrote:
>>
>> On 03/10/2022 20:24, Ashutosh Dixit wrote:
>>> PMU and sysfs use different wakeref's to "interpret" zero freq. Sysfs
>>> uses
>>> runtime PM wakeref (see intel_rps_read_punit_req and
>>> intel_rps_read_actual_frequency). PMU uses the GT parked/unparked
>>> wakeref. In general the GT wakeref is held for less time that the
>>> runtime
>>> PM wakeref which causes PMU to report a lower average freq than the
>>> average
>>> freq obtained from sampling sysfs.
>>>
>>> To resolve this, use the same freq functions (and wakeref's) in PMU as
>>> those used in sysfs.
>>>
>>> Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/7025
>>> Reported-by: Ashwin Kumar Kulkarni <ashwin.kumar.kulkarni at intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
>>> Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com>
>>> ---
>>> drivers/gpu/drm/i915/i915_pmu.c | 27 ++-------------------------
>>> 1 file changed, 2 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c
>>> b/drivers/gpu/drm/i915/i915_pmu.c
>>> index 958b37123bf1..eda03f264792 100644
>>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>>> @@ -371,37 +371,16 @@ static void
>>> frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>>> {
>>> struct drm_i915_private *i915 = gt->i915;
>>> - struct intel_uncore *uncore = gt->uncore;
>>> struct i915_pmu *pmu = &i915->pmu;
>>> struct intel_rps *rps = >->rps;
>>> if (!frequency_sampling_enabled(pmu))
>>> return;
>>> - /* Report 0/0 (actual/requested) frequency while parked. */
>>> - if (!intel_gt_pm_get_if_awake(gt))
>>> - return;
>>> -
>>> if (pmu->enable & config_mask(I915_PMU_ACTUAL_FREQUENCY)) {
>>> - u32 val;
>>> -
>>> - /*
>>> - * We take a quick peek here without using forcewake
>>> - * so that we don't perturb the system under observation
>>> - * (forcewake => !rc6 => increased power use). We expect
>>> - * that if the read fails because it is outside of the
>>> - * mmio power well, then it will return 0 -- in which
>>> - * case we assume the system is running at the intended
>>> - * frequency. Fortunately, the read should rarely fail!
>>> - */
>>> - val = intel_uncore_read_fw(uncore, GEN6_RPSTAT1);
>>> - if (val)
>>> - val = intel_rps_get_cagf(rps, val);
>>> - else
>>> - val = rps->cur_freq;
>>> -
>>> add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_ACT],
>>> - intel_gpu_freq(rps, val), period_ns / 1000);
>>> + intel_rps_read_actual_frequency(rps),
>>> + period_ns / 1000);
>>> }
>>> if (pmu->enable & config_mask(I915_PMU_REQUESTED_FREQUENCY)) {
>>
>> What is software tracking of requested frequency showing when GT is
>> parked or runtime suspended? With this change sampling would be
>> outside any such checks so we need to be sure reported value makes sense.
>>
>> Although more important open is around what is actually correct.
>>
>> For instance how does the patch affect RC6 and power? I don't know how
>> power management of different blocks is wired up, so personally I
>> would only be able to look at it empirically. In other words what I am
>> asking is this - if we changed from skipping obtaining forcewake even
>> when unparked, to obtaining forcewake if not runtime suspended - what
>> hardware blocks does that power up and how it affects RC6 and power?
>> Can it affect actual frequency or not? (Will "something" power up the
>> clocks just because we will be getting forcewake?)
>>
>> Or maybe question simplified - does 200Hz polling on existing sysfs
>> actual frequency field disturbs the system under some circumstances?
>> (Increases power and decreases RC6.) If it does then that would be a
>> problem. We want a solution which shows the real data, but where the
>> act of monitoring itself does not change it too much. If it doesn't
>> then it's okay.
>>
>> Could you somehow investigate on these topics? Maybe log RAPL GPU
>> power while polling on sysfs, versus getting the actual frequency from
>> the existing PMU implementation and see if that shows anything? Or
>> actually simpler - RAPL GPU power for current PMU intel_gpu_top versus
>> this patch? On idle(-ish) desktop workloads perhaps? Power and
>> frequency graphed for both.
>
> Another thought - considering that bspec says for 0xa01c "This register
> reflects real-time values and thus does not have a pre-determined
> default value out of reset" - could it be that it also does not reflect
> a real value when GPU is not executing anything (so zero), just happens
> to be not runtime suspended? That would mean sysfs reads could maybe
> show last known value? Just a thought to check.
>
> I've also tried on my Alderlake desktop:
>
> 1)
>
> while true; do cat gt_act_freq_mhz >/dev/null; sleep 0.005; done
>
> This costs ~120mW of GPU power and ~20% decrease in RC6.
>
>
> 2)
>
> intel_gpu_top -l -s 5 >/dev/null
This "-s 5" was pointless though. :)
Regards,
Tvrtko
>
> This costs no power or RC6.
>
> I have also never observed sysfs to show below min freq. This was with
> no desktop so it's possible this register indeed does not reflect the
> real situation when things are idle.
>
> So I think it is possible sysfs value is the misleading one.
>
> Regards,
>
> Tvrtko
More information about the dri-devel
mailing list