<!DOCTYPE html><html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> Hi Matt, This issue is not reproducible, tested multiple times, myself and Nerlige Ramappa, Umesh from kernel-telemetry team has discussed the issue, did not found any permanent fix. Please suggest what can be done. Thanks, Ravi Kumar V <div class="moz-cite-prefix">On 11/27/2024 3:09 AM, Matt Roper wrote: </div> <blockquote type="cite" cite="mid:20241126213932.GQ5725@mdroper-desk1.amr.corp.intel.com"> <pre wrap="" class="moz-quote-pre">On Tue, Nov 26, 2024 at 07:39:13PM +0530, Ravi Kumar Vodapalli wrote: </pre> <blockquote type="cite"> <pre wrap="" class="moz-quote-pre">The tolerance limit exceeds the threshold values sometimes for test igt@perf_pmu@semaphore-busy, bump up the limits slightly. Also print the log in readable format in percentage instead of nanosec. </pre> </blockquote> <pre wrap="" class="moz-quote-pre"> But why is it exceeding the limits? We're already giving 5% wiggle room (which seems like a lot); bumping that up to 10% will hide CI failures, but it doesn't really explain why the numbers are so inaccurate to begin with. If there's a real bug causing the mismatch, then we should figure out how to fix that bug rather than just making IGT more willing to ignore the problem. BTW, looking through some of the results for this test in cibuglog, it seems like there are still cases where even the larger 10% tolerance still doesn't cover the gap. E.g., <a class="moz-txt-link-freetext" href="https://intel-gfx-ci.01.org/tree/drm-tip/IGT_8126/shard-dg2-10/igt@perf_pmu@semaphore-busy.html">https://intel-gfx-ci.01.org/tree/drm-tip/IGT_8126/shard-dg2-10/igt@perf_pmu@semaphore-busy.html</a> (87.2% vs 100%) <a class="moz-txt-link-freetext" href="https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_15748/shard-dg2-4/igt@perf_pmu@semaphore-busy.html">https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_15748/shard-dg2-4/igt@perf_pmu@semaphore-busy.html</a> (88.1% vs 100%) So I think we need to understand what's actually going on here and causing these results. If there's a known, legitimate reason why the numbers are way off on specific platform(s) (e.g., some kind of workaround in the kernel or GuC) then it would be better to blacklist the test on platforms where we can't expect reliable results than to relax the test itself (and possibly let bugs go unnoticed on other platforms). Matt </pre> <blockquote type="cite"> <pre wrap="" class="moz-quote-pre"> Signed-off-by: Ravi Kumar Vodapalli <a class="moz-txt-link-rfc2396E" href="mailto:ravi.kumar.vodapalli@intel.com"><ravi.kumar.vodapalli@intel.com></a> --- tests/intel/perf_pmu.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/tests/intel/perf_pmu.c b/tests/intel/perf_pmu.c index bfa2d501a..7f43354fd 100644 --- a/tests/intel/perf_pmu.c +++ b/tests/intel/perf_pmu.c @@ -189,7 +189,7 @@ IGT_TEST_DESCRIPTION("Test the i915 pmu perf interface"); -const double tolerance = 0.05f; +const double tolerance = 0.1f; const unsigned long batch_duration_ns = 500e6; char *drpc; @@ -287,10 +287,9 @@ static uint64_t pmu_read_multi(int fd, unsigned int num, uint64_t *val) #define __assert_within_epsilon(x, ref, tol_up, tol_down, debug_data) \ igt_assert_f((double)(x) <= (1.0 + (tol_up)) * (double)(ref) && \ (double)(x) >= (1.0 - (tol_down)) * (double)(ref), \ - "'%s' != '%s' (%f not within +%.1f%%/-%.1f%% tolerance of %f)\n%s\n",\ - #x, #ref, (double)(x), \ - (tol_up) * 100.0, (tol_down) * 100.0, \ - (double)(ref), debug_data) + "%.3f%% is not within tolerance limits of +%.1f%%/-%.1f%%\n%s\n", \ + (((double)((double)(x) - (double)(ref)) * 100.0) / (double)(ref)), \ + (tol_up) * 100.0, (tol_down) * 100.0, debug_data) #define assert_within_epsilon(x, ref, tolerance) \ __assert_within_epsilon(x, ref, tolerance, tolerance, no_debug_data) -- 2.25.1 </pre> </blockquote> <pre wrap="" class="moz-quote-pre"> </pre> </blockquote> </body> </html>