[PATCH] tests/intel/perf_pmu: Fix Test Assertion Failure for semaphore-busy test
Vodapalli, Ravi Kumar
ravi.kumar.vodapalli at intel.com
Wed Nov 27 06:56:03 UTC 2024
Hi Matt,
This issue is not reproducible, tested multiple times, myself and
Nerlige Ramappa, Umesh from
kernel-telemetry team has discussed the issue, did not found any
permanent fix.
Please suggest what can be done.
Thanks,
Ravi Kumar V
On 11/27/2024 3:09 AM, Matt Roper wrote:
> On Tue, Nov 26, 2024 at 07:39:13PM +0530, Ravi Kumar Vodapalli wrote:
>> The tolerance limit exceeds the threshold values sometimes for test
>> igt at perf_pmu@semaphore-busy, bump up the limits slightly.
>> Also print the log in readable format in percentage instead of nanosec.
> But why is it exceeding the limits? We're already giving 5% wiggle room
> (which seems like a lot); bumping that up to 10% will hide CI failures,
> but it doesn't really explain why the numbers are so inaccurate to begin
> with. If there's a real bug causing the mismatch, then we should figure
> out how to fix that bug rather than just making IGT more willing to
> ignore the problem.
>
> BTW, looking through some of the results for this test in cibuglog, it
> seems like there are still cases where even the larger 10% tolerance
> still doesn't cover the gap. E.g.,
>
> https://intel-gfx-ci.01.org/tree/drm-tip/IGT_8126/shard-dg2-10/igt@perf_pmu@semaphore-busy.html
> (87.2% vs 100%)
>
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_15748/shard-dg2-4/igt@perf_pmu@semaphore-busy.html
> (88.1% vs 100%)
>
> So I think we need to understand what's actually going on here and
> causing these results. If there's a known, legitimate reason why the
> numbers are way off on specific platform(s) (e.g., some kind of
> workaround in the kernel or GuC) then it would be better to blacklist
> the test on platforms where we can't expect reliable results than to
> relax the test itself (and possibly let bugs go unnoticed on other
> platforms).
>
>
> Matt
>
>> Signed-off-by: Ravi Kumar Vodapalli<ravi.kumar.vodapalli at intel.com>
>> ---
>> tests/intel/perf_pmu.c | 9 ++++-----
>> 1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/tests/intel/perf_pmu.c b/tests/intel/perf_pmu.c
>> index bfa2d501a..7f43354fd 100644
>> --- a/tests/intel/perf_pmu.c
>> +++ b/tests/intel/perf_pmu.c
>> @@ -189,7 +189,7 @@
>>
>> IGT_TEST_DESCRIPTION("Test the i915 pmu perf interface");
>>
>> -const double tolerance = 0.05f;
>> +const double tolerance = 0.1f;
>> const unsigned long batch_duration_ns = 500e6;
>>
>> char *drpc;
>> @@ -287,10 +287,9 @@ static uint64_t pmu_read_multi(int fd, unsigned int num, uint64_t *val)
>> #define __assert_within_epsilon(x, ref, tol_up, tol_down, debug_data) \
>> igt_assert_f((double)(x) <= (1.0 + (tol_up)) * (double)(ref) && \
>> (double)(x) >= (1.0 - (tol_down)) * (double)(ref), \
>> - "'%s' != '%s' (%f not within +%.1f%%/-%.1f%% tolerance of %f)\n%s\n",\
>> - #x, #ref, (double)(x), \
>> - (tol_up) * 100.0, (tol_down) * 100.0, \
>> - (double)(ref), debug_data)
>> + "%.3f%% is not within tolerance limits of +%.1f%%/-%.1f%%\n%s\n", \
>> + (((double)((double)(x) - (double)(ref)) * 100.0) / (double)(ref)), \
>> + (tol_up) * 100.0, (tol_down) * 100.0, debug_data)
>>
>> #define assert_within_epsilon(x, ref, tolerance) \
>> __assert_within_epsilon(x, ref, tolerance, tolerance, no_debug_data)
>> --
>> 2.25.1
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/igt-dev/attachments/20241127/8192dbfd/attachment-0001.htm>
More information about the igt-dev
mailing list