[PATCH v4 0/1] INTEL_DII/FIXME: drm/i915: RC6 selftest changes for leakage error
Chris Wilson
chris.p.wilson at intel.com
Thu Jun 10 10:24:51 UTC 2021
Quoting tilak.tangudu at intel.com (2021-06-10 11:08:28)
> From: Tilak Tangudu <tilak.tangudu at intel.com>
>
> 1. Observed high energy readings in RC6 in dGPU, due to this
> other gt_pm self tests gets skipped as it is treated as error.
> 2. "GPU leaked energy in RC6" is changed to warning.
> 3. RC0 measured time is changed to 100 msec.
> 4. Made sure GT is in RC6 prior to measuring power.
>
> v2 1. Removed msleep hack and added forcewake_flush
> prior ro measuring rc0 power.(Chris Wilson)
> 2. Improvised forcewake_flush, added wait_ack_clear,
> to make sure pending acks gets cleared.
>
> v3 1. Added forcewake_flush in rc6_residency(Chris Wilson)
> 2. Cleanup and added fw_domain_flush
Sorry, for trybot it just wants the patch against drm-tip, not the pile.
While I'm here...
>
> JIRA : VLK-20511
>
> Signed-off-by: Tilak Tangudu <tilak.tangudu at intel.com>
>
> ---
> baseline: b7227afd06bac1fe6719136e2ddd2bfed1d85feb
> pile-commit: 62181f1ea1ef51f213d5b3cc08c90cb43bbe0065
> range-diff:
> -: ------------ > 1704: 722b36dc1a61 INTEL_DII/FIXME: drm/i915: RC6 selftest changes for leakage error
>
> series | 1 +
> ...IXME-drm-i915-RC6-selftest-changes-for-le.patch | 117 +++++++++++++++++++++
> 2 files changed, 118 insertions(+)
>
> diff --git a/series b/series
> index f0f65093d102..a317fe74ac14 100644
> --- a/series
> +++ b/series
> @@ -1703,3 +1703,4 @@
> 0001-INTEL_DII-NOT_UPSTREAM-drm-i915-hwconfig-HWConfig-ta.patch
> 0001-INTEL_DII-HACK-NOT_UPSTREAM-Calculate-flatccs-base-o.patch
> 0001-INTEL_DII-END-hacks-Presi-PO-hacks.patch
> +0001-INTEL_DII-FIXME-drm-i915-RC6-selftest-changes-for-le.patch
> diff --git a/0001-INTEL_DII-FIXME-drm-i915-RC6-selftest-changes-for-le.patch b/0001-INTEL_DII-FIXME-drm-i915-RC6-selftest-changes-for-le.patch
> new file mode 100644
> index 000000000000..e5f00043a196
> --- /dev/null
> +++ b/0001-INTEL_DII-FIXME-drm-i915-RC6-selftest-changes-for-le.patch
> @@ -0,0 +1,117 @@
> +From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
> +From: Tilak Tangudu <tilak.tangudu at intel.com>
> +Date: Fri, 4 Jun 2021 11:20:07 +0530
> +Subject: [PATCH] INTEL_DII/FIXME: drm/i915: RC6 selftest changes for leakage
> + error
> +
> +1. Observed high energy readings in RC6 in dGPU, due to this
> +other gt_pm self tests gets skipped as it is treated as error.
> +2. "GPU leaked energy in RC6" is changed to warning.
> +3. RC0 measured time is changed to 100 msec.
> +4. Made sure GT is in RC6 prior to measuring power.
> +
> +v2 1. Removed msleep hack and added forcewake_flush
> + prior ro measuring rc0 power.(Chris Wilson)
> + 2. Improvised forcewake_flush, added wait_ack_clear,
> + to make sure pending acks gets cleared.
> +
> +v3 1. Added forcewake_flush in rc6_residency(Chris Wilson)
> + 2. Cleanup and added fw_domain_flush
> +
> +Signed-off-by: Tilak Tangudu <tilak.tangudu at intel.com>
> +---
> + drivers/gpu/drm/i915/gt/selftest_rc6.c | 24 +++++++++++++++++-------
> + drivers/gpu/drm/i915/intel_uncore.c | 10 ++++++++++
> + 2 files changed, 27 insertions(+), 7 deletions(-)
> +
> +diff --git a/drivers/gpu/drm/i915/gt/selftest_rc6.c b/drivers/gpu/drm/i915/gt/selftest_rc6.c
> +--- a/drivers/gpu/drm/i915/gt/selftest_rc6.c
> ++++ b/drivers/gpu/drm/i915/gt/selftest_rc6.c
> +@@ -19,6 +19,7 @@ static u64 rc6_residency(struct intel_rc6 *rc6)
> +
> + /* XXX VLV_GT_MEDIA_RC6? */
> +
> ++ intel_uncore_forcewake_flush(rc6_to_uncore(rc6), FORCEWAKE_ALL);
This flush should be after the calls to intel_rc6_residency_ns() as they
may cause us to enter forcewake, and so we want to wait for the
forcewake clear ack afterwards.
> + result = intel_rc6_residency_ns(rc6, GEN6_GT_GFX_RC6);
> + if (HAS_RC6p(rc6_to_i915(rc6)))
> + result += intel_rc6_residency_ns(rc6, GEN6_GT_GFX_RC6p);
> +@@ -56,18 +57,17 @@ int live_rc6_manual(void *arg)
> +
> + /* Force RC6 off for starters */
> + __intel_rc6_disable(rc6);
> +- msleep(1); /* wakeup is not immediate, takes about 100us on icl */
In which case, do we want a intel_uncore_forcewake_flush() here? I think
we can survive with one without, but I think it does serve as pedagogy:
it states that we begin with a clear slate, no rc6, no outstanding
forcewake (that may be preventing rc6).
> +
> + res[0] = rc6_residency(rc6);
> +
> + dt = ktime_get();
> + rc0_power = librapl_energy_uJ(gt->i915);
> +- msleep(250);
> ++ msleep(100);
> + rc0_power = librapl_energy_uJ(gt->i915) - rc0_power;
> + dt = ktime_sub(ktime_get(), dt);
> + res[1] = rc6_residency(rc6);
> + if ((res[1] - res[0]) >> 10) {
> +- pr_err("RC6 residency increased by %lldus while disabled for 250ms!\n",
> ++ pr_err("RC6 residency increased by %lldus while disabled for 100ms!\n",
> + (res[1] - res[0]) >> 10);
> + err = -EINVAL;
> + goto out_unlock;
> +@@ -87,7 +87,18 @@ int live_rc6_manual(void *arg)
> + intel_rc6_park(rc6);
> +
> + res[0] = rc6_residency(rc6);
> +- intel_uncore_forcewake_flush(rc6_to_uncore(rc6), FORCEWAKE_ALL);
> ++ wait_for((rc6_residency(rc6) > res[0]), 10);
> ++ if (rc6_residency(rc6) == res[0]) {
> ++ pr_err("Did not enter RC6! RC6_STATE=%08x, RC6_CONTROL=%08x, residency=%lld\n",
> ++ intel_uncore_read_fw(gt->uncore, GEN6_RC_STATE),
> ++ intel_uncore_read_fw(gt->uncore, GEN6_RC_CONTROL),
> ++ res[0]);
> ++ err = -EINVAL;
> ++ intel_rc6_unpark(rc6);
> ++ goto out_unlock;
> ++ }
> ++
> ++ res[0] = rc6_residency(rc6);
> + dt = ktime_get();
> + rc6_power = librapl_energy_uJ(gt->i915);
> + msleep(100);
> +@@ -108,9 +119,8 @@ int live_rc6_manual(void *arg)
> + pr_info("GPU consumed %llduW in RC0 and %llduW in RC6\n",
> + rc0_power, rc6_power);
> + if (2 * rc6_power > rc0_power) {
> +- pr_err("GPU leaked energy while in RC6!\n");
> +- err = -EINVAL;
> +- goto out_unlock;
> ++ pr_info("GPU leaked energy while in RC6!\n");
> ++ err = -EINTR;
> + }
> + }
> +
> +diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> +--- a/drivers/gpu/drm/i915/intel_uncore.c
> ++++ b/drivers/gpu/drm/i915/intel_uncore.c
> +@@ -157,6 +157,15 @@ fw_domain_wait_ack_clear(const struct intel_uncore_forcewake_domain *d)
> + }
> + }
> +
> ++static inline void
> ++fw_domain_flush(const struct intel_uncore_forcewake_domain *d)
> ++{
> ++ if (fw_ack(d)) {
> ++ preempt_disable();
> ++ fw_domain_wait_ack_clear(d);
> ++ preempt_enable();
> ++ }
> ++}
And a newline after the function :)
So just those nitpicks and fingers crossed we can remove the arbitrary
sleep.
-Chris
More information about the Intel-gfx-trybot
mailing list