[PATCH i-g-t 3/6] tests/intel/xe_oa: Sanity check PEC report data for TestOa metric set

Tue Apr 22 19:42:17 UTC 2025

On Wed, 16 Apr 2025 09:20:07 -0700, Umesh Nerlige Ramappa wrote:
>
> On Tue, Apr 15, 2025 at 09:57:42AM -0700, Dixit, Ashutosh wrote:
> > On Mon, 14 Apr 2025 16:16:14 -0700, Umesh Nerlige Ramappa wrote:
> >>
> >> On Tue, Apr 08, 2025 at 11:12:07AM -0700, Ashutosh Dixit wrote:
> >> > Implement sanity checking for Xe2 PEC OA reports. Previously there was
> >> > sanity checking only for Xe1 OA reports, but no sanity checking for Xe2 PEC
> >> > OA reports.
> >> >
> >> > Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com>
> >> > ---
> >> > tests/intel/xe_oa.c | 121 +++++++++++++++++++++++++++++++++++++++++++-
> >> > 1 file changed, 119 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/tests/intel/xe_oa.c b/tests/intel/xe_oa.c
> >> > index 2440101900..eaf97ae0df 100644
> >> > --- a/tests/intel/xe_oa.c
> >> > +++ b/tests/intel/xe_oa.c
> >> > @@ -966,6 +966,115 @@ accumulator_print(struct accumulator *accumulator, const char *title)
> >> >		igt_debug("\tC%u = %"PRIu64"\n", i, deltas[idx++]);
> >> > }
> >> >
> >> > +
> >> > +/*
> >> > + * pec_sanity_check_reports() uses the following properties of the TestOa
> >> > + * metric set with the "576B_PEC64LL" or XE_OA_FORMAT_PEC64u64 format. See
> >> > + * e.g. lib/xe/oa-configs/oa-lnl.xml.
> >> > + *
> >> > + * If pec[] is the array of pec qwords following the report header (Bspec
> >> > + * 60942) then we have:
> >> > + *
> >> > + *	pec[2]  : test_event1_cycles
> >> > + *	pec[3]  : test_event1_cycles_xecore0
> >> > + *	pec[4]  : test_event1_cycles_xecore1
> >> > + *	pec[5]  : test_event1_cycles_xecore2
> >> > + *	pec[6]  : test_event1_cycles_xecore3
> >> > + *	pec[21] : test_event1_cycles_xecore4
> >> > + *	pec[22] : test_event1_cycles_xecore5
> >> > + *	pec[23] : test_event1_cycles_xecore6
> >> > + *	pec[24] : test_event1_cycles_xecore7
> >> > + *
> >> > + * test_event1_cycles_xecore* increment with every clock, so they increment
> >> > + * the same as gpu_ticks in report headers in successive reports. And
> >> > + * test_event1_cycles increment by 'gpu_ticks * num_xecores'.
> >> > + *
> >> > + * These equations are not exact due to fluctuations, but are precise when
> >> > + * averaged over long periods.
> >> > + */
> >> > +static void pec_sanity_check_one(const u32 *report)
> >> > +{
> >> > +	int xecore_idx[] = {3, 4, 5, 6, 21, 22, 23, 24};
> >> > +	u64 first, *pec = (u64 *)(report + 8);
> >> > +
> >> > +	igt_debug("\ttest_event1_cycles: %#lx\n", pec[2]);
> >> > +	for (int i = 0; i < ARRAY_SIZE(xecore_idx); i++)
> >> > +		igt_debug("\ttest_event1_cycles_xecore %d: %#lx\n", i, pec[xecore_idx[i]]);
> >> > +
> >> > +	/* Compare against the first non-zero test_event1_cycles_xecore* */
> >> > +	for (int i = 0; i < ARRAY_SIZE(xecore_idx); i++) {
> >> > +		first = pec[xecore_idx[i]];
> >> > +		if (first)
> >> > +			break;
> >> > +	}
> >> > +
> >> > +	/* test_event1_cycles_xecore* should be within an epsilon of each other */
> >> > +	for (int i = 0; i < ARRAY_SIZE(xecore_idx); i++) {
> >> > +		igt_debug("n %d: pec[n] %#lx, first %#lx\n",
> >> > +			  xecore_idx[i], pec[xecore_idx[i]], first);
> >> > +		/* 0 value for pec[xecore_idx[i]] indicates missing xecore */
> >> > +		if (pec[xecore_idx[i]])
> >> > +			assert_within_epsilon(pec[xecore_idx[i]], first,
> >> > 0.1);
> >> > +	}
> >> > +
> >> > +	igt_debug("first * num_xecores: %#lx, pec[2] %#lx\n",
> >> > +		  first * intel_xe_perf->devinfo.n_eu_sub_slices, pec[2]);
> >> > +	/* test_event1_cycles should be close to (test_event1_cycles_xecore* * num_xecores) */
> >> > +	assert_within_epsilon(first * intel_xe_perf->devinfo.n_eu_sub_slices, pec[2], 0.1);
> >> > +}
> >> > +
> >> > +static void pec_sanity_check_two(const u32 *report0, const u32 *report1,
> >> > +				 struct intel_xe_perf_metric_set *set)
> >>
> >> I would just s/pec_sanity_check_two/pec_sanity_check/ to validate 2 reports
> >> and drop the "pec_sanity_check_one" altogether. We only care about delta
> >> between 2 counters.
> >
> > Not sure I agree with this. Because checks in check_one and check_two are
> > really independent. And checks in both functions seem to work (checked with
> > checking all reports for non_zero_reason). And check_one allows a way to
> > check each report independently. Can you explain what problem you see with
> > check_one?
> >
> > Even if we remove check_one, I still want to retain the unused function, so
> > I would want to add a '__attribute__ ((unused))' and retain it. Just in
> > case someone wants to use it later. I at least want to get into git. And
> > then maybe remove it later, if needed.
> >
>
> I haven't come across any use case where reports are validated
> independently other than checking if the counters are zero/non-zero.
> If you think that it adds value, you could retain it.

I've added a patch to drop it in v2. This way it will be in git in case we
want to get it out later on. Thanks.

> >> > +{
> >> > +	u64 tick_delta = oa_tick_delta(report1, report0, set->perf_oa_format);
> >> > +	int xecore_idx[] = {3, 4, 5, 6, 21, 22, 23, 24};
> >> > +	u64 *pec0 = (u64 *)(report0 + 8);
> >> > +	u64 *pec1 = (u64 *)(report1 + 8);
> >> > +
> >> > +	igt_debug("tick delta = %#lx\n", tick_delta);
> >> > +
> >> > +	/* Difference in test_event1_cycles_xecore* values should be close to tick_delta */
> >> > +	for (int i = 0; i < ARRAY_SIZE(xecore_idx); i++) {
> >>
> >> Maybe, within the loop you can have,
> >>
> >> n = xecore_idx[i];
> >>
> >> and that can be used in the below code, for ex:
> >>
> >>		igt_debug("pec1[%d] - pec0[%d] %#lx, tick delta %#lx\n", n, pec1[n] - pec0[n], tick_delta);
> >
> > OK, this is probably a little bit cleaner.
> >
> >>
> >> > +		igt_debug("n %d: pec1[n] - pec0[n] %#lx, tick delta %#lx\n",
> >> > +			  xecore_idx[i], pec1[xecore_idx[i]] - pec0[xecore_idx[i]], tick_delta);
> >>
> >>
> >> > +		/* 0 value for pec[xecore_idx[i]] indicates missing xecore */
> >> > +		if (pec1[xecore_idx[i]] && pec0[xecore_idx[i]])
> >> > +			assert_within_epsilon(pec1[xecore_idx[i]] - pec0[xecore_idx[i]],
> >> > +					      tick_delta, 0.1);
> >> > +		/* Same test_event1_cycles_xecore* should be present in all reports */
> >> > +		if (pec1[xecore_idx[i]])
> >> > +			igt_assert(pec0[xecore_idx[i]]);
> >> > +	}
> >> > +
> >> > +	igt_debug("pec1[2] - pec0[2] %#lx, tick_delta * num_xecores: %#lx\n",
> >> > +		  pec1[2] - pec0[2], tick_delta * intel_xe_perf->devinfo.n_eu_sub_slices);
> >> > +	/* Difference in test_event1_cycles should be close to (tick_delta * num_xecores) */
> >> > +	assert_within_epsilon(pec1[2] - pec0[2],
> >> > +			      tick_delta * intel_xe_perf->devinfo.n_eu_sub_slices, 0.1);
> >> > +}