[PATCH v6] drm/i915/selftests: Implement frequency logging for energy reading validation

Gupta, Anshuman anshuman.gupta at intel.com
Wed Nov 20 10:43:07 UTC 2024



> -----Original Message-----
> From: Gupta, Anshuman
> Sent: Wednesday, November 20, 2024 4:01 PM
> To: Nilawar, Badal <badal.nilawar at intel.com>; Anirban, Sk
> <sk.anirban at intel.com>; intel-gfx at lists.freedesktop.org
> Cc: Poosa, Karthik <karthik.poosa at intel.com>; Pottumuttu, Sai Teja
> <sai.teja.pottumuttu at intel.com>
> Subject: RE: [PATCH v6] drm/i915/selftests: Implement frequency logging for
> energy reading validation
> 
> 
> 
> > -----Original Message-----
> > From: Nilawar, Badal <badal.nilawar at intel.com>
> > Sent: Wednesday, November 20, 2024 1:44 PM
> > To: Anirban, Sk <sk.anirban at intel.com>;
> > intel-gfx at lists.freedesktop.org
> > Cc: Gupta, Anshuman <anshuman.gupta at intel.com>; Poosa, Karthik
> > <karthik.poosa at intel.com>; Pottumuttu, Sai Teja
> > <sai.teja.pottumuttu at intel.com>
> > Subject: Re: [PATCH v6] drm/i915/selftests: Implement frequency
> > logging for energy reading validation
> >
> >
> >
> > On 13-11-2024 15:20, Sk Anirban wrote:
> > > Introduce RC6 & RC0 frequency logging mechanism to ensure accurate
> > > energy readings aimed at addressing GPU energy leaks and power
> > > measurement failures.
> > > This enhancement will help ensure the accuracy of energy readings.
> > >
> > > v2:
> > >    - Improved commit message.
> > > v3:
> > >    - Used pr_err log to display frequency (Anshuman)
> > >    - Sorted headers alphabetically (Sai Teja)
> > > v4:
> > >    - Improved commit message.
> > >    - Fix pr_err log (Sai Teja)
> > > v5:
> > >    - Add error & debug logging for RC0 power and frequency checks
> > > (Anshuman)
> > > v6:
> > >    - Modify debug logging for RC0 power and frequency checks (Sai
> > > Teja)
> > >
> > > Signed-off-by: Sk Anirban <sk.anirban at intel.com>
> > > Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu at intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/gt/selftest_rc6.c | 15 +++++++++++++--
> > >   1 file changed, 13 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gt/selftest_rc6.c
> > > b/drivers/gpu/drm/i915/gt/selftest_rc6.c
> > > index 1aa1446c8fb0..a8776f88d6a1 100644
> > > --- a/drivers/gpu/drm/i915/gt/selftest_rc6.c
> > > +++ b/drivers/gpu/drm/i915/gt/selftest_rc6.c
> > > @@ -8,6 +8,7 @@
> > >   #include "intel_gpu_commands.h"
> > >   #include "intel_gt_requests.h"
> > >   #include "intel_ring.h"
> > > +#include "intel_rps.h"
> > >   #include "selftest_rc6.h"
> > >
> > >   #include "selftests/i915_random.h"
> > > @@ -38,6 +39,9 @@ int live_rc6_manual(void *arg)
> > >   	ktime_t dt;
> > >   	u64 res[2];
> > >   	int err = 0;
> > > +	u32 rc0_freq = 0;
> > > +	u32 rc6_freq = 0;
> > > +	struct intel_rps *rps = &gt->rps;
> > >
> > >   	/*
> > >   	 * Our claim is that we can "encourage" the GPU to enter rc6 at will.
> > > @@ -66,6 +70,7 @@ int live_rc6_manual(void *arg)
> > >   	rc0_power = librapl_energy_uJ() - rc0_power;
> > >   	dt = ktime_sub(ktime_get(), dt);
> > >   	res[1] = rc6_residency(rc6);
> > > +	rc0_freq = intel_rps_read_actual_frequency(rps);
> > >   	if ((res[1] - res[0]) >> 10) {
> > >   		pr_err("RC6 residency increased by %lldus while disabled for
> > 1000ms!\n",
> > >   		       (res[1] - res[0]) >> 10); @@ -77,7 +82,11 @@ int
> > > live_rc6_manual(void *arg)
> > >   		rc0_power = div64_u64(NSEC_PER_SEC * rc0_power,
> > >   				      ktime_to_ns(dt));
> > >   		if (!rc0_power) {
> > > -			pr_err("No power measured while in RC0\n");
> > > +			if (rc0_freq)
> > > +				pr_err("No power measured while in RC0!
> > GPU Freq: %u in RC0\n",
> > > +				       rc0_freq);
> If rc0 frequency is there then, this has to be pr_dbg, otherwise what is the
> purpose of this patch.
It should return from here without any error in case there is no rc0 power but we do have rc0 frequency.
Thanks,
Anshuman.
> > > +			else
> > > +				pr_err("No power and freq measured while in
> > RC0\n");
> > >   			err = -EINVAL;
> > >   			goto out_unlock;
> > >   		}
> > > @@ -91,6 +100,7 @@ int live_rc6_manual(void *arg)
> > >   	dt = ktime_get();
> > >   	rc6_power = librapl_energy_uJ();
> > >   	msleep(100);
> > > +	rc6_freq = intel_rps_read_actual_frequency(rps);
> >
> > I think intention of reading frequency here is to know if device was
> > not in RC6 when there is failure. But for the platforms below gen12
> > reading act frequency will cause gt wake as GEN6_RPSTAT reg requires
> forcewake.
> > To avoid wake when device is in RC6 read actual frequency without
> > applying forcewake.
> If reading act_freq will wake the device, How to read frequency without
> forcewake then ?
> 
> Thanks,
> Anshuaman
> >
> > Additionally add delay, may be delay of 1 seconds after re-enabling
> > RC6 manually and forcewake flush.
> >
> > Regards,
> > Badal
> >
> > >   	rc6_power = librapl_energy_uJ() - rc6_power;
> > >   	dt = ktime_sub(ktime_get(), dt);
> > >   	res[1] = rc6_residency(rc6);
> > > @@ -108,7 +118,8 @@ int live_rc6_manual(void *arg)
> > >   		pr_info("GPU consumed %llduW in RC0 and %llduW in
> > RC6\n",
> > >   			rc0_power, rc6_power);
> > >   		if (2 * rc6_power > rc0_power) {
> > > -			pr_err("GPU leaked energy while in RC6!\n");
> > > +			pr_err("GPU leaked energy while in RC6! GPU Freq:
> > %u in RC6 and %u in RC0\n",
> > > +			       rc6_freq, rc0_freq);
> > >   			err = -EINVAL;
> > >   			goto out_unlock;
> > >   		}



More information about the Intel-gfx mailing list