[PATCH v6] drm/i915/selftests: Implement frequency logging for energy reading validation

Nilawar, Badal badal.nilawar at intel.com
Wed Nov 20 08:13:40 UTC 2024



On 13-11-2024 15:20, Sk Anirban wrote:
> Introduce RC6 & RC0 frequency logging mechanism to ensure accurate
> energy readings aimed at addressing GPU energy leaks and power
> measurement failures.
> This enhancement will help ensure the accuracy of energy readings.
> 
> v2:
>    - Improved commit message.
> v3:
>    - Used pr_err log to display frequency (Anshuman)
>    - Sorted headers alphabetically (Sai Teja)
> v4:
>    - Improved commit message.
>    - Fix pr_err log (Sai Teja)
> v5:
>    - Add error & debug logging for RC0 power and frequency checks (Anshuman)
> v6:
>    - Modify debug logging for RC0 power and frequency checks (Sai Teja)
> 
> Signed-off-by: Sk Anirban <sk.anirban at intel.com>
> Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu at intel.com>
> ---
>   drivers/gpu/drm/i915/gt/selftest_rc6.c | 15 +++++++++++++--
>   1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_rc6.c b/drivers/gpu/drm/i915/gt/selftest_rc6.c
> index 1aa1446c8fb0..a8776f88d6a1 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_rc6.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_rc6.c
> @@ -8,6 +8,7 @@
>   #include "intel_gpu_commands.h"
>   #include "intel_gt_requests.h"
>   #include "intel_ring.h"
> +#include "intel_rps.h"
>   #include "selftest_rc6.h"
>   
>   #include "selftests/i915_random.h"
> @@ -38,6 +39,9 @@ int live_rc6_manual(void *arg)
>   	ktime_t dt;
>   	u64 res[2];
>   	int err = 0;
> +	u32 rc0_freq = 0;
> +	u32 rc6_freq = 0;
> +	struct intel_rps *rps = &gt->rps;
>   
>   	/*
>   	 * Our claim is that we can "encourage" the GPU to enter rc6 at will.
> @@ -66,6 +70,7 @@ int live_rc6_manual(void *arg)
>   	rc0_power = librapl_energy_uJ() - rc0_power;
>   	dt = ktime_sub(ktime_get(), dt);
>   	res[1] = rc6_residency(rc6);
> +	rc0_freq = intel_rps_read_actual_frequency(rps);
>   	if ((res[1] - res[0]) >> 10) {
>   		pr_err("RC6 residency increased by %lldus while disabled for 1000ms!\n",
>   		       (res[1] - res[0]) >> 10);
> @@ -77,7 +82,11 @@ int live_rc6_manual(void *arg)
>   		rc0_power = div64_u64(NSEC_PER_SEC * rc0_power,
>   				      ktime_to_ns(dt));
>   		if (!rc0_power) {
> -			pr_err("No power measured while in RC0\n");
> +			if (rc0_freq)
> +				pr_err("No power measured while in RC0! GPU Freq: %u in RC0\n",
> +				       rc0_freq);
> +			else
> +				pr_err("No power and freq measured while in RC0\n");
>   			err = -EINVAL;
>   			goto out_unlock;
>   		}
> @@ -91,6 +100,7 @@ int live_rc6_manual(void *arg)
>   	dt = ktime_get();
>   	rc6_power = librapl_energy_uJ();
>   	msleep(100);
> +	rc6_freq = intel_rps_read_actual_frequency(rps);

I think intention of reading frequency here is to know if device was not 
in RC6 when there is failure. But for the platforms below gen12 reading 
act frequency will cause gt wake as GEN6_RPSTAT reg requires forcewake. 
To avoid wake when device is in RC6 read actual frequency without 
applying forcewake.

Additionally add delay, may be delay of 1 seconds after re-enabling RC6 
manually and forcewake flush.

Regards,
Badal

>   	rc6_power = librapl_energy_uJ() - rc6_power;
>   	dt = ktime_sub(ktime_get(), dt);
>   	res[1] = rc6_residency(rc6);
> @@ -108,7 +118,8 @@ int live_rc6_manual(void *arg)
>   		pr_info("GPU consumed %llduW in RC0 and %llduW in RC6\n",
>   			rc0_power, rc6_power);
>   		if (2 * rc6_power > rc0_power) {
> -			pr_err("GPU leaked energy while in RC6!\n");
> +			pr_err("GPU leaked energy while in RC6! GPU Freq: %u in RC6 and %u in RC0\n",
> +			       rc6_freq, rc0_freq);
>   			err = -EINVAL;
>   			goto out_unlock;
>   		}



More information about the Intel-gfx mailing list