[PATCH i-g-t 3/6] tests/intel/xe_oa: Sanity check PEC report data for TestOa metric set

Tue Apr 15 16:57:42 UTC 2025

On Mon, 14 Apr 2025 16:16:14 -0700, Umesh Nerlige Ramappa wrote:
>
> On Tue, Apr 08, 2025 at 11:12:07AM -0700, Ashutosh Dixit wrote:
> > Implement sanity checking for Xe2 PEC OA reports. Previously there was
> > sanity checking only for Xe1 OA reports, but no sanity checking for Xe2 PEC
> > OA reports.
> >
> > Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com>
> > ---
> > tests/intel/xe_oa.c | 121 +++++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 119 insertions(+), 2 deletions(-)
> >
> > diff --git a/tests/intel/xe_oa.c b/tests/intel/xe_oa.c
> > index 2440101900..eaf97ae0df 100644
> > --- a/tests/intel/xe_oa.c
> > +++ b/tests/intel/xe_oa.c
> > @@ -966,6 +966,115 @@ accumulator_print(struct accumulator *accumulator, const char *title)
> >		igt_debug("\tC%u = %"PRIu64"\n", i, deltas[idx++]);
> > }
> >
> > +
> > +/*
> > + * pec_sanity_check_reports() uses the following properties of the TestOa
> > + * metric set with the "576B_PEC64LL" or XE_OA_FORMAT_PEC64u64 format. See
> > + * e.g. lib/xe/oa-configs/oa-lnl.xml.
> > + *
> > + * If pec[] is the array of pec qwords following the report header (Bspec
> > + * 60942) then we have:
> > + *
> > + *	pec[2]  : test_event1_cycles
> > + *	pec[3]  : test_event1_cycles_xecore0
> > + *	pec[4]  : test_event1_cycles_xecore1
> > + *	pec[5]  : test_event1_cycles_xecore2
> > + *	pec[6]  : test_event1_cycles_xecore3
> > + *	pec[21] : test_event1_cycles_xecore4
> > + *	pec[22] : test_event1_cycles_xecore5
> > + *	pec[23] : test_event1_cycles_xecore6
> > + *	pec[24] : test_event1_cycles_xecore7
> > + *
> > + * test_event1_cycles_xecore* increment with every clock, so they increment
> > + * the same as gpu_ticks in report headers in successive reports. And
> > + * test_event1_cycles increment by 'gpu_ticks * num_xecores'.
> > + *
> > + * These equations are not exact due to fluctuations, but are precise when
> > + * averaged over long periods.
> > + */
> > +static void pec_sanity_check_one(const u32 *report)
> > +{
> > +	int xecore_idx[] = {3, 4, 5, 6, 21, 22, 23, 24};
> > +	u64 first, *pec = (u64 *)(report + 8);
> > +
> > +	igt_debug("\ttest_event1_cycles: %#lx\n", pec[2]);
> > +	for (int i = 0; i < ARRAY_SIZE(xecore_idx); i++)
> > +		igt_debug("\ttest_event1_cycles_xecore %d: %#lx\n", i, pec[xecore_idx[i]]);
> > +
> > +	/* Compare against the first non-zero test_event1_cycles_xecore* */
> > +	for (int i = 0; i < ARRAY_SIZE(xecore_idx); i++) {
> > +		first = pec[xecore_idx[i]];
> > +		if (first)
> > +			break;
> > +	}
> > +
> > +	/* test_event1_cycles_xecore* should be within an epsilon of each other */
> > +	for (int i = 0; i < ARRAY_SIZE(xecore_idx); i++) {
> > +		igt_debug("n %d: pec[n] %#lx, first %#lx\n",
> > +			  xecore_idx[i], pec[xecore_idx[i]], first);
> > +		/* 0 value for pec[xecore_idx[i]] indicates missing xecore */
> > +		if (pec[xecore_idx[i]])
> > +			assert_within_epsilon(pec[xecore_idx[i]], first,
> > 0.1);
> > +	}
> > +
> > +	igt_debug("first * num_xecores: %#lx, pec[2] %#lx\n",
> > +		  first * intel_xe_perf->devinfo.n_eu_sub_slices, pec[2]);
> > +	/* test_event1_cycles should be close to (test_event1_cycles_xecore* * num_xecores) */
> > +	assert_within_epsilon(first * intel_xe_perf->devinfo.n_eu_sub_slices, pec[2], 0.1);
> > +}
> > +
> > +static void pec_sanity_check_two(const u32 *report0, const u32 *report1,
> > +				 struct intel_xe_perf_metric_set *set)
>
> I would just s/pec_sanity_check_two/pec_sanity_check/ to validate 2 reports
> and drop the "pec_sanity_check_one" altogether. We only care about delta
> between 2 counters.

Not sure I agree with this. Because checks in check_one and check_two are
really independent. And checks in both functions seem to work (checked with
checking all reports for non_zero_reason). And check_one allows a way to
check each report independently. Can you explain what problem you see with
check_one?

Even if we remove check_one, I still want to retain the unused function, so
I would want to add a '__attribute__ ((unused))' and retain it. Just in
case someone wants to use it later. I at least want to get into git. And
then maybe remove it later, if needed.

> > +{
> > +	u64 tick_delta = oa_tick_delta(report1, report0, set->perf_oa_format);
> > +	int xecore_idx[] = {3, 4, 5, 6, 21, 22, 23, 24};
> > +	u64 *pec0 = (u64 *)(report0 + 8);
> > +	u64 *pec1 = (u64 *)(report1 + 8);
> > +
> > +	igt_debug("tick delta = %#lx\n", tick_delta);
> > +
> > +	/* Difference in test_event1_cycles_xecore* values should be close to tick_delta */
> > +	for (int i = 0; i < ARRAY_SIZE(xecore_idx); i++) {
>
> Maybe, within the loop you can have,
>
> n = xecore_idx[i];
>
> and that can be used in the below code, for ex:
>
>		igt_debug("pec1[%d] - pec0[%d] %#lx, tick delta %#lx\n", n, pec1[n] - pec0[n], tick_delta);

OK, this is probably a little bit cleaner.

>
> > +		igt_debug("n %d: pec1[n] - pec0[n] %#lx, tick delta %#lx\n",
> > +			  xecore_idx[i], pec1[xecore_idx[i]] - pec0[xecore_idx[i]], tick_delta);
>
>
> > +		/* 0 value for pec[xecore_idx[i]] indicates missing xecore */
> > +		if (pec1[xecore_idx[i]] && pec0[xecore_idx[i]])
> > +			assert_within_epsilon(pec1[xecore_idx[i]] - pec0[xecore_idx[i]],
> > +					      tick_delta, 0.1);
> > +		/* Same test_event1_cycles_xecore* should be present in all reports */
> > +		if (pec1[xecore_idx[i]])
> > +			igt_assert(pec0[xecore_idx[i]]);
> > +	}
> > +
> > +	igt_debug("pec1[2] - pec0[2] %#lx, tick_delta * num_xecores: %#lx\n",
> > +		  pec1[2] - pec0[2], tick_delta * intel_xe_perf->devinfo.n_eu_sub_slices);
> > +	/* Difference in test_event1_cycles should be close to (tick_delta * num_xecores) */
> > +	assert_within_epsilon(pec1[2] - pec0[2],
> > +			      tick_delta * intel_xe_perf->devinfo.n_eu_sub_slices, 0.1);
> > +}