[PATCH v4 0/1] INTEL_DII/FIXME: drm/i915: RC6 selftest changes for leakage error

Chris Wilson chris.p.wilson at intel.com
Thu Jun 10 10:24:51 UTC 2021


Quoting tilak.tangudu at intel.com (2021-06-10 11:08:28)
> From: Tilak Tangudu <tilak.tangudu at intel.com>
> 
> 1. Observed high energy readings in RC6 in dGPU, due to this
>    other gt_pm self tests gets skipped as it is treated as error.
> 2. "GPU leaked energy in RC6" is changed to warning.
> 3. RC0 measured time is changed to 100 msec.
> 4. Made sure GT is in RC6 prior to measuring power.
> 
> v2 1. Removed msleep hack and added forcewake_flush
>       prior ro measuring rc0 power.(Chris Wilson)
>    2. Improvised forcewake_flush, added wait_ack_clear,
>       to make sure pending acks gets cleared.
> 
> v3 1. Added forcewake_flush in rc6_residency(Chris Wilson)
>    2. Cleanup and added fw_domain_flush

Sorry, for trybot it just wants the patch against drm-tip, not the pile.

While I'm here...

> 
> JIRA : VLK-20511
> 
> Signed-off-by: Tilak Tangudu <tilak.tangudu at intel.com>
> 
> ---
> baseline: b7227afd06bac1fe6719136e2ddd2bfed1d85feb
> pile-commit: 62181f1ea1ef51f213d5b3cc08c90cb43bbe0065
> range-diff:
>    -:  ------------ > 1704:  722b36dc1a61 INTEL_DII/FIXME: drm/i915: RC6 selftest changes for leakage error
> 
>  series                                             |   1 +
>  ...IXME-drm-i915-RC6-selftest-changes-for-le.patch | 117 +++++++++++++++++++++
>  2 files changed, 118 insertions(+)
> 
> diff --git a/series b/series
> index f0f65093d102..a317fe74ac14 100644
> --- a/series
> +++ b/series
> @@ -1703,3 +1703,4 @@
>  0001-INTEL_DII-NOT_UPSTREAM-drm-i915-hwconfig-HWConfig-ta.patch
>  0001-INTEL_DII-HACK-NOT_UPSTREAM-Calculate-flatccs-base-o.patch
>  0001-INTEL_DII-END-hacks-Presi-PO-hacks.patch
> +0001-INTEL_DII-FIXME-drm-i915-RC6-selftest-changes-for-le.patch
> diff --git a/0001-INTEL_DII-FIXME-drm-i915-RC6-selftest-changes-for-le.patch b/0001-INTEL_DII-FIXME-drm-i915-RC6-selftest-changes-for-le.patch
> new file mode 100644
> index 000000000000..e5f00043a196
> --- /dev/null
> +++ b/0001-INTEL_DII-FIXME-drm-i915-RC6-selftest-changes-for-le.patch
> @@ -0,0 +1,117 @@
> +From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
> +From: Tilak Tangudu <tilak.tangudu at intel.com>
> +Date: Fri, 4 Jun 2021 11:20:07 +0530
> +Subject: [PATCH] INTEL_DII/FIXME: drm/i915: RC6 selftest changes for leakage
> + error
> +
> +1. Observed high energy readings in RC6 in dGPU, due to this
> +other gt_pm self tests gets skipped as it is treated as error.
> +2. "GPU leaked energy in RC6" is changed to warning.
> +3. RC0 measured time is changed to 100 msec.
> +4. Made sure GT is in RC6 prior to measuring power.
> +
> +v2 1. Removed msleep hack and added forcewake_flush
> +      prior ro measuring rc0 power.(Chris Wilson)
> +   2. Improvised forcewake_flush, added wait_ack_clear,
> +      to make sure pending acks gets cleared.
> +
> +v3 1. Added forcewake_flush in rc6_residency(Chris Wilson)
> +   2. Cleanup and added fw_domain_flush
> +
> +Signed-off-by: Tilak Tangudu <tilak.tangudu at intel.com>
> +---
> + drivers/gpu/drm/i915/gt/selftest_rc6.c | 24 +++++++++++++++++-------
> + drivers/gpu/drm/i915/intel_uncore.c    | 10 ++++++++++
> + 2 files changed, 27 insertions(+), 7 deletions(-)
> +
> +diff --git a/drivers/gpu/drm/i915/gt/selftest_rc6.c b/drivers/gpu/drm/i915/gt/selftest_rc6.c
> +--- a/drivers/gpu/drm/i915/gt/selftest_rc6.c
> ++++ b/drivers/gpu/drm/i915/gt/selftest_rc6.c
> +@@ -19,6 +19,7 @@ static u64 rc6_residency(struct intel_rc6 *rc6)
> + 
> +       /* XXX VLV_GT_MEDIA_RC6? */
> + 
> ++      intel_uncore_forcewake_flush(rc6_to_uncore(rc6), FORCEWAKE_ALL);

This flush should be after the calls to intel_rc6_residency_ns() as they
may cause us to enter forcewake, and so we want to wait for the
forcewake clear ack afterwards.

> +       result = intel_rc6_residency_ns(rc6, GEN6_GT_GFX_RC6);
> +       if (HAS_RC6p(rc6_to_i915(rc6)))
> +               result += intel_rc6_residency_ns(rc6, GEN6_GT_GFX_RC6p);
> +@@ -56,18 +57,17 @@ int live_rc6_manual(void *arg)
> + 
> +       /* Force RC6 off for starters */
> +       __intel_rc6_disable(rc6);
> +-      msleep(1); /* wakeup is not immediate, takes about 100us on icl */

In which case, do we want a intel_uncore_forcewake_flush() here? I think
we can survive with one without, but I think it does serve as pedagogy:
it states that we begin with a clear slate, no rc6, no outstanding
forcewake (that may be preventing rc6).

> + 
> +       res[0] = rc6_residency(rc6);
> + 
> +       dt = ktime_get();
> +       rc0_power = librapl_energy_uJ(gt->i915);
> +-      msleep(250);
> ++      msleep(100);
> +       rc0_power = librapl_energy_uJ(gt->i915) - rc0_power;
> +       dt = ktime_sub(ktime_get(), dt);
> +       res[1] = rc6_residency(rc6);
> +       if ((res[1] - res[0]) >> 10) {
> +-              pr_err("RC6 residency increased by %lldus while disabled for 250ms!\n",
> ++              pr_err("RC6 residency increased by %lldus while disabled for 100ms!\n",
> +                      (res[1] - res[0]) >> 10);
> +               err = -EINVAL;
> +               goto out_unlock;
> +@@ -87,7 +87,18 @@ int live_rc6_manual(void *arg)
> +       intel_rc6_park(rc6);
> + 
> +       res[0] = rc6_residency(rc6);
> +-      intel_uncore_forcewake_flush(rc6_to_uncore(rc6), FORCEWAKE_ALL);
> ++      wait_for((rc6_residency(rc6) > res[0]), 10);
> ++      if (rc6_residency(rc6) == res[0]) {
> ++              pr_err("Did not enter RC6! RC6_STATE=%08x, RC6_CONTROL=%08x, residency=%lld\n",
> ++                     intel_uncore_read_fw(gt->uncore, GEN6_RC_STATE),
> ++                     intel_uncore_read_fw(gt->uncore, GEN6_RC_CONTROL),
> ++                     res[0]);
> ++              err = -EINVAL;
> ++              intel_rc6_unpark(rc6);
> ++              goto out_unlock;
> ++      }
> ++
> ++      res[0] = rc6_residency(rc6);
> +       dt = ktime_get();
> +       rc6_power = librapl_energy_uJ(gt->i915);
> +       msleep(100);
> +@@ -108,9 +119,8 @@ int live_rc6_manual(void *arg)
> +               pr_info("GPU consumed %llduW in RC0 and %llduW in RC6\n",
> +                       rc0_power, rc6_power);
> +               if (2 * rc6_power > rc0_power) {
> +-                      pr_err("GPU leaked energy while in RC6!\n");
> +-                      err = -EINVAL;
> +-                      goto out_unlock;
> ++                      pr_info("GPU leaked energy while in RC6!\n");
> ++                      err = -EINTR;
> +               }
> +       }
> + 
> +diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> +--- a/drivers/gpu/drm/i915/intel_uncore.c
> ++++ b/drivers/gpu/drm/i915/intel_uncore.c
> +@@ -157,6 +157,15 @@ fw_domain_wait_ack_clear(const struct intel_uncore_forcewake_domain *d)
> +       }
> + }
> + 
> ++static inline void
> ++fw_domain_flush(const struct intel_uncore_forcewake_domain *d)
> ++{
> ++      if (fw_ack(d)) {
> ++              preempt_disable();
> ++              fw_domain_wait_ack_clear(d);
> ++              preempt_enable();
> ++      }
> ++}

And a newline after the function :)

So just those nitpicks and fingers crossed we can remove the arbitrary
sleep.
-Chris


More information about the Intel-gfx-trybot mailing list