[PATCH v3 1/3] tests/kms_vrr: Bucketize refresh rate tolerance

Fri Apr 11 04:28:46 UTC 2025

On 4/4/2025 10:07 AM, Mitul Golani wrote:
> Reduce false failures while preserving timing accuracy. Introduce
> a small tolerance buffer based on refresh rate which accounts for
> HW/SW latency without compromising validation on HRR panel.
> Although an imperical number but already IGT is living with that.
s/imperical/emperical
> This also ensures that asked refresh rate is not too off and always
> catch the real HW/software issues.
>
> --v2:
> - Refresh rate criteria changes.
>
> --v3:
> - Comment changes (Uma).
> - Add FIXME
>
> Signed-off-by: Mitul Golani <mitulkumar.ajitkumar.golani at intel.com>
> ---
>   tests/kms_vrr.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 56 insertions(+), 3 deletions(-)
>
> diff --git a/tests/kms_vrr.c b/tests/kms_vrr.c
> index c4bb30f6a..ccf32f453 100644
> --- a/tests/kms_vrr.c
> +++ b/tests/kms_vrr.c
> @@ -410,6 +410,52 @@ do_flip(data_t *data, igt_fb_t *fb)
>   	igt_reset_timeout();
>   }
>   
> +static void
> +calculate_tolerance(uint64_t *threshold_hi, uint64_t *threshold_lo, uint64_t rates_ns)
> +{
> +	uint32_t refresh_rate = NSECS_PER_SEC / rates_ns;
> +
> +	if (refresh_rate < 0)
> +		return;
> +
> +	/*
> +	 * Current IGT implementation follows this sequence:
> +	 * 1. Perform a page flip (`do_flip`).
> +	 * 2. Wait for the flip completion event.
> +	 * 3. Compare the timestamp of the flip completion event with the previous frame’s
> +	 *    completion timestamp.
> +	 * 4. Adjust CPU cycle burning based on the relative frame time.

Perhaps need to add a step for mention the tolerance step and about what 
it does.

5. Tolerance Check: Determine if the flip completion time falls within 
the acceptable range.

Perhaps we can add something like below:
"We set a tolerance value as the acceptable range of time within which a 
flip completion event should occur."

> +	 *
> +	 * If a flip completes too early or too late, it is marked as out of tolerance.
> +	 * As a result, additional CPU cycles are burned to match the `target_ns`.
> +	 * Even if the next frame is on time, the total frame time now includes:
> +	 * Burned CPU cycle time (from the previous frame) + Flip completion event time.
> +	 * This leads to miscalculation, causing **false out-of-range detections**.
> +	 * The impact is more significant on High Refresh Rate (HRR) panels, where:
> +	 * The allowed tolerance window is smaller and more correction time is required.
> +	 * i.e. for 210hz (4.762ms), allowed range is 209hz(4.784ms) to 211hz(4.739ms).
> +	 * This comes just 23 microsecond tolerance, which is much lesser
> +	 * for accounting HW/SW latency, CPU burn cycle latency and correction logic
> +	 * applied in igt for validation.
> +	 *
> +	 * To address this implement a Bucketing Strategy:
> +	 * Provide a small tolerance buffer to allow IGT tests to account for correction.
> +	 * Based on range of asked refresh rate. This prevents excessive failures due to minor
> +	 * timing adjustments.
> +	 */
> +
> +	if (refresh_rate <= 120) {
> +		*threshold_hi = NSECS_PER_SEC / (((float)NSECS_PER_SEC / rates_ns) + 1);
> +		*threshold_lo = NSECS_PER_SEC / (((float)NSECS_PER_SEC / rates_ns) - 1);
> +	} else if (refresh_rate >= 120 && refresh_rate <= 240) {
> +		*threshold_hi = NSECS_PER_SEC / (((float)NSECS_PER_SEC / rates_ns) + 5);
> +		*threshold_lo = NSECS_PER_SEC / (((float)NSECS_PER_SEC / rates_ns) - 5);
> +	} else {
> +		*threshold_hi = NSECS_PER_SEC / (((float)NSECS_PER_SEC / rates_ns) + 10);
> +		*threshold_lo = NSECS_PER_SEC / (((float)NSECS_PER_SEC / rates_ns) - 10);
> +	}

These ranges and values might need to adjust. More ranges might be 
required later.

I am wondering if we can use a table for this:

struct{
intmin_rate; /*Min refresh rate in Hertz*/
intmax_rate; / *Max refresh rate in Hertz */
inttolerance; /* Tolerance in Hertz */
}tolerance_config;

/*
* Tolerance values are determined based on predefined ranges:
* - ≤ 120 Hz: ±1
* - 121-240 Hz: ±5
* - > 240 Hz: ±10
*/
struct tolerance_configtolerance_table[]={
{0,120,1},
{121,240,5},
{241,INT_MAX,10}
};

Then any further changes in the values will not have impact on the code.

> +}
> +
>   /*
>    * Flips at the given rate and measures against the expected value.
>    * Returns the pass rate as a percentage from 0 - 100.
> @@ -439,9 +485,7 @@ flip_and_measure(data_t *data, igt_output_t *output, enum pipe pipe,
>   		else
>   			exp_rate_ns = vtest_ns.max;
>   
> -		/* Allow ~1 Hz deviation for different reasons causing delay. */
> -		threshold_hi[i] = NSECS_PER_SEC / (((float)NSECS_PER_SEC / exp_rate_ns) + 1);
> -		threshold_lo[i] = NSECS_PER_SEC / (((float)NSECS_PER_SEC / exp_rate_ns) - 1);
> +		calculate_tolerance(&threshold_hi[i], &threshold_lo[i], exp_rate_ns);
>   
>   		igt_info("Requested rate[%d]: %"PRIu64" ns, Expected rate between: %"PRIu64" ns to %"PRIu64" ns\n",
>   				i, rates_ns[i], threshold_hi[i], threshold_lo[i]);
> @@ -497,6 +541,15 @@ flip_and_measure(data_t *data, igt_output_t *output, enum pipe pipe,
>   		wait_ns -= diff_ns;
>   		target_ns = event_ns + wait_ns;
>   
> +		/*
> +		 * FIXME: This logic makes next immediate frame time calculation
> +		 * in inconsistent state, even if next flip comes on correct time,
> +		 * it will be marked as fail due to time difference from previous
> +		 * flip. Needs to reset at every cycle for correct measurement.
> +		 * Once this is corrected, igt can ask for more stricter pass

s/igt/the test

I agree with the idea in general. Thanks for spending some time on this 
and all the experimentation.

Regards,

Ankit

> +		 * criteria.
> +		 */
> +
>   		while (get_time_ns() < target_ns - 10);
>   	}
>