[PATCH i-g-t 3/6] tests/intel/xe_oa: Sanity check PEC report data for TestOa metric set

Umesh Nerlige Ramappa umesh.nerlige.ramappa at intel.com
Wed Apr 16 16:20:07 UTC 2025


On Tue, Apr 15, 2025 at 09:57:42AM -0700, Dixit, Ashutosh wrote:
>On Mon, 14 Apr 2025 16:16:14 -0700, Umesh Nerlige Ramappa wrote:
>>
>> On Tue, Apr 08, 2025 at 11:12:07AM -0700, Ashutosh Dixit wrote:
>> > Implement sanity checking for Xe2 PEC OA reports. Previously there was
>> > sanity checking only for Xe1 OA reports, but no sanity checking for Xe2 PEC
>> > OA reports.
>> >
>> > Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com>
>> > ---
>> > tests/intel/xe_oa.c | 121 +++++++++++++++++++++++++++++++++++++++++++-
>> > 1 file changed, 119 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/tests/intel/xe_oa.c b/tests/intel/xe_oa.c
>> > index 2440101900..eaf97ae0df 100644
>> > --- a/tests/intel/xe_oa.c
>> > +++ b/tests/intel/xe_oa.c
>> > @@ -966,6 +966,115 @@ accumulator_print(struct accumulator *accumulator, const char *title)
>> >		igt_debug("\tC%u = %"PRIu64"\n", i, deltas[idx++]);
>> > }
>> >
>> > +
>> > +/*
>> > + * pec_sanity_check_reports() uses the following properties of the TestOa
>> > + * metric set with the "576B_PEC64LL" or XE_OA_FORMAT_PEC64u64 format. See
>> > + * e.g. lib/xe/oa-configs/oa-lnl.xml.
>> > + *
>> > + * If pec[] is the array of pec qwords following the report header (Bspec
>> > + * 60942) then we have:
>> > + *
>> > + *	pec[2]  : test_event1_cycles
>> > + *	pec[3]  : test_event1_cycles_xecore0
>> > + *	pec[4]  : test_event1_cycles_xecore1
>> > + *	pec[5]  : test_event1_cycles_xecore2
>> > + *	pec[6]  : test_event1_cycles_xecore3
>> > + *	pec[21] : test_event1_cycles_xecore4
>> > + *	pec[22] : test_event1_cycles_xecore5
>> > + *	pec[23] : test_event1_cycles_xecore6
>> > + *	pec[24] : test_event1_cycles_xecore7
>> > + *
>> > + * test_event1_cycles_xecore* increment with every clock, so they increment
>> > + * the same as gpu_ticks in report headers in successive reports. And
>> > + * test_event1_cycles increment by 'gpu_ticks * num_xecores'.
>> > + *
>> > + * These equations are not exact due to fluctuations, but are precise when
>> > + * averaged over long periods.
>> > + */
>> > +static void pec_sanity_check_one(const u32 *report)
>> > +{
>> > +	int xecore_idx[] = {3, 4, 5, 6, 21, 22, 23, 24};
>> > +	u64 first, *pec = (u64 *)(report + 8);
>> > +
>> > +	igt_debug("\ttest_event1_cycles: %#lx\n", pec[2]);
>> > +	for (int i = 0; i < ARRAY_SIZE(xecore_idx); i++)
>> > +		igt_debug("\ttest_event1_cycles_xecore %d: %#lx\n", i, pec[xecore_idx[i]]);
>> > +
>> > +	/* Compare against the first non-zero test_event1_cycles_xecore* */
>> > +	for (int i = 0; i < ARRAY_SIZE(xecore_idx); i++) {
>> > +		first = pec[xecore_idx[i]];
>> > +		if (first)
>> > +			break;
>> > +	}
>> > +
>> > +	/* test_event1_cycles_xecore* should be within an epsilon of each other */
>> > +	for (int i = 0; i < ARRAY_SIZE(xecore_idx); i++) {
>> > +		igt_debug("n %d: pec[n] %#lx, first %#lx\n",
>> > +			  xecore_idx[i], pec[xecore_idx[i]], first);
>> > +		/* 0 value for pec[xecore_idx[i]] indicates missing xecore */
>> > +		if (pec[xecore_idx[i]])
>> > +			assert_within_epsilon(pec[xecore_idx[i]], first,
>> > 0.1);
>> > +	}
>> > +
>> > +	igt_debug("first * num_xecores: %#lx, pec[2] %#lx\n",
>> > +		  first * intel_xe_perf->devinfo.n_eu_sub_slices, pec[2]);
>> > +	/* test_event1_cycles should be close to (test_event1_cycles_xecore* * num_xecores) */
>> > +	assert_within_epsilon(first * intel_xe_perf->devinfo.n_eu_sub_slices, pec[2], 0.1);
>> > +}
>> > +
>> > +static void pec_sanity_check_two(const u32 *report0, const u32 *report1,
>> > +				 struct intel_xe_perf_metric_set *set)
>>
>> I would just s/pec_sanity_check_two/pec_sanity_check/ to validate 2 reports
>> and drop the "pec_sanity_check_one" altogether. We only care about delta
>> between 2 counters.
>
>Not sure I agree with this. Because checks in check_one and check_two are
>really independent. And checks in both functions seem to work (checked with
>checking all reports for non_zero_reason). And check_one allows a way to
>check each report independently. Can you explain what problem you see with
>check_one?
>
>Even if we remove check_one, I still want to retain the unused function, so
>I would want to add a '__attribute__ ((unused))' and retain it. Just in
>case someone wants to use it later. I at least want to get into git. And
>then maybe remove it later, if needed.
>

I haven't come across any use case where reports are validated 
independently other than checking if the counters are zero/non-zero.  

If you think that it adds value, you could retain it.

Thanks,
Umesh

>> > +{
>> > +	u64 tick_delta = oa_tick_delta(report1, report0, set->perf_oa_format);
>> > +	int xecore_idx[] = {3, 4, 5, 6, 21, 22, 23, 24};
>> > +	u64 *pec0 = (u64 *)(report0 + 8);
>> > +	u64 *pec1 = (u64 *)(report1 + 8);
>> > +
>> > +	igt_debug("tick delta = %#lx\n", tick_delta);
>> > +
>> > +	/* Difference in test_event1_cycles_xecore* values should be close to tick_delta */
>> > +	for (int i = 0; i < ARRAY_SIZE(xecore_idx); i++) {
>>
>> Maybe, within the loop you can have,
>>
>> n = xecore_idx[i];
>>
>> and that can be used in the below code, for ex:
>>
>>		igt_debug("pec1[%d] - pec0[%d] %#lx, tick delta %#lx\n", n, pec1[n] - pec0[n], tick_delta);
>
>OK, this is probably a little bit cleaner.
>
>>
>> > +		igt_debug("n %d: pec1[n] - pec0[n] %#lx, tick delta %#lx\n",
>> > +			  xecore_idx[i], pec1[xecore_idx[i]] - pec0[xecore_idx[i]], tick_delta);
>>
>>
>> > +		/* 0 value for pec[xecore_idx[i]] indicates missing xecore */
>> > +		if (pec1[xecore_idx[i]] && pec0[xecore_idx[i]])
>> > +			assert_within_epsilon(pec1[xecore_idx[i]] - pec0[xecore_idx[i]],
>> > +					      tick_delta, 0.1);
>> > +		/* Same test_event1_cycles_xecore* should be present in all reports */
>> > +		if (pec1[xecore_idx[i]])
>> > +			igt_assert(pec0[xecore_idx[i]]);
>> > +	}
>> > +
>> > +	igt_debug("pec1[2] - pec0[2] %#lx, tick_delta * num_xecores: %#lx\n",
>> > +		  pec1[2] - pec0[2], tick_delta * intel_xe_perf->devinfo.n_eu_sub_slices);
>> > +	/* Difference in test_event1_cycles should be close to (tick_delta * num_xecores) */
>> > +	assert_within_epsilon(pec1[2] - pec0[2],
>> > +			      tick_delta * intel_xe_perf->devinfo.n_eu_sub_slices, 0.1);
>> > +}


More information about the igt-dev mailing list