[PATCH] tests/intel/perf_pmu: Fix Test Assertion Failure for semaphore-busy test

Matt Roper matthew.d.roper at intel.com
Tue Nov 26 21:39:32 UTC 2024


On Tue, Nov 26, 2024 at 07:39:13PM +0530, Ravi Kumar Vodapalli wrote:
> The tolerance limit exceeds the threshold values sometimes for test
> igt at perf_pmu@semaphore-busy, bump up the limits slightly.
> Also print the log in readable format in percentage instead of nanosec.

But why is it exceeding the limits?  We're already giving 5% wiggle room
(which seems like a lot); bumping that up to 10% will hide CI failures,
but it doesn't really explain why the numbers are so inaccurate to begin
with.  If there's a real bug causing the mismatch, then we should figure
out how to fix that bug rather than just making IGT more willing to
ignore the problem.

BTW, looking through some of the results for this test in cibuglog, it
seems like there are still cases where even the larger 10% tolerance
still doesn't cover the gap.  E.g.,

https://intel-gfx-ci.01.org/tree/drm-tip/IGT_8126/shard-dg2-10/igt@perf_pmu@semaphore-busy.html
(87.2% vs 100%)

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_15748/shard-dg2-4/igt@perf_pmu@semaphore-busy.html
(88.1% vs 100%)

So I think we need to understand what's actually going on here and
causing these results.  If there's a known, legitimate reason why the
numbers are way off on specific platform(s) (e.g., some kind of
workaround in the kernel or GuC) then it would be better to blacklist
the test on platforms where we can't expect reliable results than to
relax the test itself (and possibly let bugs go unnoticed on other
platforms).


Matt

> 
> Signed-off-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli at intel.com>
> ---
>  tests/intel/perf_pmu.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/intel/perf_pmu.c b/tests/intel/perf_pmu.c
> index bfa2d501a..7f43354fd 100644
> --- a/tests/intel/perf_pmu.c
> +++ b/tests/intel/perf_pmu.c
> @@ -189,7 +189,7 @@
>  
>  IGT_TEST_DESCRIPTION("Test the i915 pmu perf interface");
>  
> -const double tolerance = 0.05f;
> +const double tolerance = 0.1f;
>  const unsigned long batch_duration_ns = 500e6;
>  
>  char *drpc;
> @@ -287,10 +287,9 @@ static uint64_t pmu_read_multi(int fd, unsigned int num, uint64_t *val)
>  #define __assert_within_epsilon(x, ref, tol_up, tol_down, debug_data) \
>  	igt_assert_f((double)(x) <= (1.0 + (tol_up)) * (double)(ref) && \
>  		     (double)(x) >= (1.0 - (tol_down)) * (double)(ref), \
> -		     "'%s' != '%s' (%f not within +%.1f%%/-%.1f%% tolerance of %f)\n%s\n",\
> -		     #x, #ref, (double)(x), \
> -		     (tol_up) * 100.0, (tol_down) * 100.0, \
> -		     (double)(ref), debug_data)
> +		     "%.3f%% is not within tolerance limits of +%.1f%%/-%.1f%%\n%s\n", \
> +		     (((double)((double)(x) - (double)(ref)) * 100.0) / (double)(ref)), \
> +		     (tol_up) * 100.0, (tol_down) * 100.0, debug_data)
>  
>  #define assert_within_epsilon(x, ref, tolerance) \
>  	__assert_within_epsilon(x, ref, tolerance, tolerance, no_debug_data)
> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


More information about the igt-dev mailing list