[PATCH i-g-t 4/8] tests/intel/xe_drm_fdinfo: Add single engine tests

Tue Jul 2 20:57:41 UTC 2024

On Tue, Jul 02, 2024 at 11:18:48AM GMT, Umesh Nerlige Ramappa wrote:
>On Mon, Jul 01, 2024 at 11:26:27AM -0700, Umesh Nerlige Ramappa wrote:
>>On Mon, Jul 01, 2024 at 12:35:24PM -0500, Lucas De Marchi wrote:
>>>On Sat, Jun 22, 2024 at 07:00:58AM GMT, Umesh Nerlige Ramappa wrote:
>>>>Add simple tests that submit work to one engine and measure utilization
>>>>per class.
>>>>
>>>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
>>>>---
>>>>tests/intel/xe_drm_fdinfo.c | 109 ++++++++++++++++++++++++++++++++++++
>>>>1 file changed, 109 insertions(+)
>>>>
>>>>diff --git a/tests/intel/xe_drm_fdinfo.c b/tests/intel/xe_drm_fdinfo.c
>>>>index 27459b7f1..852931d71 100644
>>>>--- a/tests/intel/xe_drm_fdinfo.c
>>>>+++ b/tests/intel/xe_drm_fdinfo.c
>>>>@@ -25,6 +25,15 @@
>>>>* SUBTEST: basic
>>>>* Description: Check if basic fdinfo content is present
>>>>*
>>>>+ * SUBTEST: drm-idle
>>>>+ * Description: Check that engines show no load when idle
>>>>+ *
>>>>+ * SUBTEST: drm-busy-idle
>>>>+ * Description: Check that engines show load when idle after busy
>>>>+ *
>>>>+ * SUBTEST: drm-busy-idle-isolation
>>>>+ * Description: Check that engine load does not spill over to other drm clients
>>>>+ *
>>>
>>>can we split the mem-related basic check from the utilization-related
>>>basic checks?
>>
>>will add a separate basic subtest for utilization.
>>
>>>
>>>>* SUBTEST: drm-total-resident
>>>>* Description: Create and compare total and resident memory consumption by client
>>>>*
>>>>@@ -39,11 +48,18 @@ IGT_TEST_DESCRIPTION("Read and verify drm client memory consumption and engine u
>>>>
>>>>#define BO_SIZE (65536)
>>>>
>>>>+/* flag masks */
>>>>+#define TEST_BUSY		(1 << 0)
>>>>+#define TEST_TRAILING_IDLE	(1 << 1)
>>>>+#define TEST_ISOLATION		(1 << 2)
>>>
>>>shoud we document the mapping to the subtests?
>>
>>If you mean something like what the xe_exec_basic (below) is doing, 
>>then I can try that.
>>
>>{ "basic", 0 },
>>{ "basic-defer-mmap", DEFER_ALLOC },
>>{ "basic-defer-bind", DEFER_ALLOC | DEFER_BIND },
>>
>>>
>>>>+
>>>>struct pceu_cycles {
>>>>	uint64_t cycles;
>>>>	uint64_t total_cycles;
>>>>};
>>>>
>>>>+const unsigned long batch_duration_ns = 500e6;
>>>
>>>please use * NSEC_PER_SEC
>>>
>>>>+
>>>>static const char *engine_map[] = {
>>>>	"rcs",
>>>>	"bcs",
>>>>@@ -76,6 +92,27 @@ static void read_engine_cycles(int xe, struct pceu_cycles *pceu)
>>>>		pceu[class].total_cycles = info.total_cycles[class];
>>>>	}
>>>>}
>>>>+
>>>>+/*
>>>>+ * Helper for cases where we assert on time spent sleeping (directly or
>>>>+ * indirectly), so make it more robust by ensuring the system sleep time
>>>>+ * is within test tolerance to start with.
>>>>+ */
>>>>+static unsigned int measured_usleep(unsigned int usec)
>>>>+{
>>>>+	struct timespec ts = { };
>>>>+	unsigned int slept;
>>>>+
>>>>+	slept = igt_nsec_elapsed(&ts);
>>>>+	igt_assert(slept == 0);
>>>>+	do {
>>>>+		usleep(usec - slept);
>>>
>>>mismatch between usec and nsec here in the first sleep. I guess it's
>>>easier to convert the arg to nsec and then always have the same unit
>>>
>>>>+		slept = igt_nsec_elapsed(&ts) / 1000;
>>>
>>>NSEC_PER_USEC
>>>
>>>>+	} while (slept < usec);
>>>>+
>>>>+	return igt_nsec_elapsed(&ts);
>>>
>>>wrong unit? you receive usec as arg and are supposed to return usec, but
>>>rather return nsec.
>>
>>Hmm, there is a lot of unit mismatch in this helper. I think I will 
>>just convert everything here to nsec.
>>
>>>
>>>Also, I'm not sure why exactly this is needed. usleep() from the C
>>>library already guarantees it's going to sleep for *at least* the
>>>specified amount.
>>>
>>>usleep(3):
>>>...
>>>DESCRIPTION
>>>     The usleep() function suspends execution of the calling thread
>>>     for (at least) usec microseconds.  The sleep may be lengthened
>>>     slightly by any system activity or by the time spent processing
>>>     the call or by the granularity of system timers.
>>>
>>>RETURN VALUE
>>>     The usleep() function returns 0 on success.  On error, -1 is
>>>     returned, with errno set to indicate the error.
>>>
>>
>>I think the requirement is to know how long the delay actually was  
>>because the "sleep may be lengthened". All utilization/counter 
>>comparisons will be against the actual delay.

that would be a recipe for a very noisy test that doesn't withstand
minor disturbances. The exact amount of time the task slept on the CPU
doesn't mean the same execution time on the GPU.  What we can do is to
provide a margin to gurantee it's approximately that number

>
>Looks like I am not using the returned value from measured_usleep 
>anyways, so will drop this helper.

yeah, I don't really think it's needed.

>
>Umesh
>
>>
>>>
>>>>+}
>>>>+
>>>>/* Subtests */
>>>>static void test_active(int fd, struct drm_xe_engine *engine)
>>>>{
>>>>@@ -451,6 +488,66 @@ xe_spin_ctx_destroy(int fd, struct xe_spin_ctx *ctx)
>>>>	free(ctx);
>>>>}
>>>>
>>>>+static void
>>>>+check_results(struct pceu_cycles *s1, struct pceu_cycles *s2,
>>>>+	      int class, int width, unsigned int flags)
>>>>+{
>>>>+	double percent;
>>>>+
>>>>+	igt_debug("%s: sample 1: cycles %lu, total_cycles %lu\n",
>>>>+		  engine_map[class], s1[class].cycles, s1[class].total_cycles);
>>>>+	igt_debug("%s: sample 2: cycles %lu, total_cycles %lu\n",
>>>>+		  engine_map[class], s2[class].cycles, s2[class].total_cycles);
>>>>+
>>>>+	percent = ((s2[class].cycles - s1[class].cycles) * 100) /
>>>>+		  ((s2[class].total_cycles + 1) - s1[class].total_cycles);
>>>>+
>>>>+	/* for parallel engines scale the busyness with width */
>>>
>>>s/parallel engines/parallel submission/ ?
>>
>>will change
>>>
>>>>+	percent = percent / width;
>>>>+
>>>>+	igt_debug("%s: percent: %f\n", engine_map[class], percent);
>>>>+
>>>>+	if (flags & TEST_BUSY)
>>>>+		igt_assert(percent >= 95 && percent <= 105);
>>>
>>>percent should never be > 100.
>>
>>Yikes... that's a miss. I did see a couple >=100 though!! and 
>>modified this to check how far it goes above 100, but completely 
>>forgot to debug the issue itself. Will debug.

it would be good to know when it happens.... maybe because we read the
gpu timestamp earlier before looping through the exec queues and
reading/aggregating each of them.

	read() {
		read_gpu_timestamp(); // 0
		for_each_exec_queue()
			read_exec_queue_timestamp(); // 0
	}

	read() {
		read_gpu_timestamp();	// 10
						<---- process prempted,
						      lenghtening even more x below
		for_each_exec_queue()
			read_exec_queue_timestamp(); // 10 + x
	}

so, yes... it seems it could fluctuate for more than 100%
in the spinner case. On  hindsight I think the margin you left is good.

>>
>>>Also, for >= 95 to be true this needs to
>>>be the only client.  Should we check for number of clients and skip if
>>>we find others?
>>
>>Didn't think of that. Wondering if there is a way to enforce a 
>>single client somehow in the test. If not, then I agree to skip if 
>>more clients are present. Any idea how to check if there are other 
>>clients?

instead of reading /proc/self/, scan /proc... Even if there's a TOCTOU,
it should be sufficient for this test. If there is more than 1 drm
client, then you can't assert the conditions. If there's 1, then you
*likely* can.

Lucas De Marchi

>>
>>>
>>>>+	else
>>>>+		igt_assert(!percent);
>>>>+}
>>>>+
>>>>+static void
>>>>+single(int fd, struct drm_xe_engine_class_instance *hwe, int width, int count,
>>>>+       unsigned int flags)
>>>>+{
>>>>+	struct pceu_cycles pceu1[DRM_XE_ENGINE_CLASS_COMPUTE + 1];
>>>>+	struct pceu_cycles pceu2[DRM_XE_ENGINE_CLASS_COMPUTE + 1];
>>>>+	struct xe_spin_ctx *ctx = NULL;
>>>>+	int local_fd = fd;
>>>>+	uint32_t vm;
>>>>+
>>>>+	if (flags & TEST_ISOLATION)
>>>>+		local_fd = drm_reopen_driver(fd);
>>>
>>>so you have 2 clients in the same process... Maybe add this info to the
>>>subtest description above.
>>
>>sure
>>
>>>
>>>>+
>>>>+	vm = xe_vm_create(local_fd, 0, 0);
>>>>+	if (flags & TEST_BUSY) {
>>>>+		ctx = xe_spin_ctx_init(local_fd, hwe, vm, width, count);
>>>>+		xe_spin_sync_start(local_fd, ctx);
>>>>+	}
>>>>+
>>>>+	read_engine_cycles(local_fd, pceu1);
>>>>+	measured_usleep(batch_duration_ns / 1000);
>>>>+	if (flags & TEST_TRAILING_IDLE)
>>>>+		xe_spin_sync_end(local_fd, ctx);
>>>>+	read_engine_cycles(local_fd, pceu2);
>>>>+
>>>>+	check_results(pceu1, pceu2, hwe->engine_class, width, flags);
>>>>+
>>>>+	xe_spin_sync_end(local_fd, ctx);
>>>>+	xe_spin_ctx_destroy(local_fd, ctx);
>>>>+	xe_vm_destroy(local_fd, vm);
>>>>+
>>>>+	if (flags & TEST_ISOLATION)
>>>>+		close(local_fd);
>>>
>>>humn... it doesn't seem this is actually testing any isolation? It's
>>>just reopening the fd and testing the same thing on the duplicated fd.
>>>
>>>what I'd expect of an isolation test is, e.g.:
>>>
>>>/proc/X/fdinfo/Y
>>>	drm-client: A
>>>	drm-rcs-cycles: 0
>>>
>>>when executing on another fd:
>>>
>>>/proc/X/fdinfo/Z
>>>	drm-client: B
>>>	drm-rcs-cyles: 10
>>>
>>>
>>>so, check that read_engine_cycles() returns !0 for fd where you
>>>submitted the spin and 0 for where you didn't.
>>
>>will do
>>
>>Thanks,
>>Umesh
>>>
>>>Lucas De Marchi
>>>
>>>>+}
>>>>+
>>>>igt_main
>>>>{
>>>>	struct drm_xe_engine_class_instance *hwe;
>>>>@@ -485,6 +582,18 @@ igt_main
>>>>	igt_subtest("basic")
>>>>		basic(xe, num_classes);
>>>>
>>>>+	igt_subtest("drm-idle")
>>>>+		xe_for_each_engine(xe, hwe)
>>>>+			single(xe, hwe, 1, 1, 0);
>>>>+
>>>>+	igt_subtest("drm-busy-idle")
>>>>+		xe_for_each_engine(xe, hwe)
>>>>+			single(xe, hwe, 1, 1, TEST_BUSY | TEST_TRAILING_IDLE);
>>>>+
>>>>+	igt_subtest("drm-busy-idle-isolation")
>>>>+		xe_for_each_engine(xe, hwe)
>>>>+			single(xe, hwe, 1, 1, TEST_BUSY | TEST_TRAILING_IDLE | TEST_ISOLATION);
>>>>+
>>>>	igt_describe("Create and compare total and resident memory consumption by client");
>>>>	igt_subtest("drm-total-resident")
>>>>		test_total_resident(xe);
>>>>-- 
>>>>2.34.1
>>>>