[Intel-gfx] [PATCH] drm/i915/pmu: Use GT parked for estimating RC6 while asleep
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Thu Sep 12 09:55:00 UTC 2019
On 12/09/2019 10:39, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-09-12 10:20:39)
>>
>> On 11/09/2019 17:38, Chris Wilson wrote:
>>> As we track when we put the GT device to sleep upon idling, we can use
>>> that callback to sample the current rc6 counters and record the
>>> timestamp for estimating samples after that point while asleep.
>>>
>>> v2: Stick to using ktime_t
>>> v3: Track user_wakerefs that interfere with the new
>>> intel_gt_pm_wait_for_idle
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105010
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> ---
>>> drivers/gpu/drm/i915/gem/i915_gem_pm.c | 19 ++++
>>> drivers/gpu/drm/i915/gt/intel_gt_types.h | 1 +
>>> drivers/gpu/drm/i915/i915_debugfs.c | 22 ++---
>>> drivers/gpu/drm/i915/i915_pmu.c | 120 +++++++++++------------
>>> drivers/gpu/drm/i915/i915_pmu.h | 4 +-
>>> 5 files changed, 90 insertions(+), 76 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
>>> index 3bd764104d41..45a72cb698db 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
>>> @@ -141,6 +141,21 @@ bool i915_gem_load_power_context(struct drm_i915_private *i915)
>>> return switch_to_kernel_context_sync(&i915->gt);
>>> }
>>>
>>> +static void user_forcewake(struct intel_gt *gt, bool suspend)
>>> +{
>>> + int count = atomic_read(>->user_wakeref); >>> +
>>> + if (likely(!count))
>>> + return;
>>> +
>>> + intel_gt_pm_get(gt);
>>> + if (suspend)
>>> + atomic_sub(count, >->wakeref.count);
GEM_BUG_ON for underflow?
Presumably count is effectively atomic here since userspace is not
running yet/any more. Might warrant a comment?
>>> + else
>>> + atomic_add(count, >->wakeref.count);
>>> + intel_gt_pm_put(gt);
>>> +}
>>> +
>>> void i915_gem_suspend(struct drm_i915_private *i915)
>>> {
>>> GEM_TRACE("\n");
>>> @@ -148,6 +163,8 @@ void i915_gem_suspend(struct drm_i915_private *i915)
>>> intel_wakeref_auto(&i915->ggtt.userfault_wakeref, 0);
>>> flush_workqueue(i915->wq);
>>>
>>> + user_forcewake(&i915->gt, true);
>>
>> This complication is needed only because you changed user forcewake
>> handling to use intel_gt_pm_get/put instead of intel_runtime_pm_get?
>> Which in turn is because of the CONFIG_PM ifdef removal below? Wouldn't
>> it be simpler to keep both as were? Maybe I am missing something...
>
> Not quite. The change is because we stop tracking rc6 after parking,
> because we stop using the pm-timestamps in favour of our own gt
> tracking. However, that required tying the debugfs into the gt pm in
> order for us to notice the forced wakeup outside of the request flow.
>
> Either we keep using the unreliable runtime-pm interactions, or not. The
> patch hinges upon that decision. Or alternative, we say we just don't
> care about miscounting with debugfs/i915_user_forcewake.
True, I think we have to account for it.
>>> static u64 get_rc6(struct intel_gt *gt)
>>> {
>>> -#if IS_ENABLED(CONFIG_PM)
>>> struct drm_i915_private *i915 = gt->i915;
>>> - struct intel_runtime_pm *rpm = &i915->runtime_pm;
>>> struct i915_pmu *pmu = &i915->pmu;
>>> - intel_wakeref_t wakeref;
>>> unsigned long flags;
>>> u64 val;
>>>
>>> - wakeref = intel_runtime_pm_get_if_in_use(rpm);
>>> - if (wakeref) {
>>> + spin_lock_irqsave(&pmu->lock, flags);
>>> +
>>> + if (intel_gt_pm_get_if_awake(gt)) {
>>> val = __get_rc6(gt);
>>> - intel_runtime_pm_put(rpm, wakeref);
>>> + intel_gt_pm_put(gt);
>>>
>>> /*
>>> * If we are coming back from being runtime suspended we must
>>> @@ -466,19 +494,13 @@ static u64 get_rc6(struct intel_gt *gt)
>>> * previously.
>>> */
>>>
>>> - spin_lock_irqsave(&pmu->lock, flags);
>>> -
>>> if (val >= pmu->sample[__I915_SAMPLE_RC6_ESTIMATED].cur) {
>>> pmu->sample[__I915_SAMPLE_RC6_ESTIMATED].cur = 0;
>>> pmu->sample[__I915_SAMPLE_RC6].cur = val;
>>> } else {
>>> val = pmu->sample[__I915_SAMPLE_RC6_ESTIMATED].cur;
>>> }
>>> -
>>> - spin_unlock_irqrestore(&pmu->lock, flags);
>>
>> For this branch pmu->lock is only needed over the estimation block, not
>> over __get_rc6() and intel_gt_pm_get_if_awake(). But I agree it's more
>> efficient to do it like this to avoid multiple irq-off-on transitions
>> via intel_rc6_residency_ns. I wanted to suggest local_irq_disable and
>> separate spin_(un)lock's on both if branches for more self-documenting ,
>> less confusion, but then single call also has it's benefits.
>
> If I am not mistaken, we need to serialise over the get_if_awake. Or at
> least it makes it easier to argue about the GT state and whether we
> need to choose between updating ESTIMATED or ACTUAL.
I am not sure that we do but anyway doesn't harm a lot and has the above
described benefits as well so okay.
>
> [snip]
>
>> Don't we end up doing the irqsave spinlock needlessly when !CONFIG_PM?
>
> No, the intent is to serialise with i915_pmu_gt_parked and
> i915_pmu_gt_unparked (and the GT awake state), which are independent of
> CONFIG_PM.
Yes but with !CONFIG_PM we can always read the real counters and don't
need to do any additional magic. In fact code in i915_pmu_gt_(un)parked
could be ifdef-ed out for that case as well.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list