[PATCH i-g-t 4/8] tests/intel/xe_drm_fdinfo: Add single engine tests

Tue Jul 2 20:57:40 UTC 2024

On Tue, Jul 02, 2024 at 11:18:48AM -0700, Umesh Nerlige Ramappa wrote:
>On Mon, Jul 01, 2024 at 11:26:27AM -0700, Umesh Nerlige Ramappa wrote:
>>On Mon, Jul 01, 2024 at 12:35:24PM -0500, Lucas De Marchi wrote:
>>>On Sat, Jun 22, 2024 at 07:00:58AM GMT, Umesh Nerlige Ramappa wrote:
>>>>Add simple tests that submit work to one engine and measure utilization
>>>>per class.
>>>>
>>>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
>>>>---
>>>>tests/intel/xe_drm_fdinfo.c | 109 ++++++++++++++++++++++++++++++++++++
>>>>1 file changed, 109 insertions(+)
>>>>
>>>>diff --git a/tests/intel/xe_drm_fdinfo.c b/tests/intel/xe_drm_fdinfo.c
>>>>index 27459b7f1..852931d71 100644
>>>>--- a/tests/intel/xe_drm_fdinfo.c
>>>>+++ b/tests/intel/xe_drm_fdinfo.c
>>>>@@ -25,6 +25,15 @@
>>>>* SUBTEST: basic
>>>>* Description: Check if basic fdinfo content is present
>>>>*
>>>>+ * SUBTEST: drm-idle
>>>>+ * Description: Check that engines show no load when idle
>>>>+ *
>>>>+ * SUBTEST: drm-busy-idle
>>>>+ * Description: Check that engines show load when idle after busy
>>>>+ *
>>>>+ * SUBTEST: drm-busy-idle-isolation
>>>>+ * Description: Check that engine load does not spill over to other drm clients
>>>>+ *
>>>
>>>can we split the mem-related basic check from the utilization-related
>>>basic checks?
>>
>>will add a separate basic subtest for utilization.
>>
>>>
>>>>* SUBTEST: drm-total-resident
>>>>* Description: Create and compare total and resident memory consumption by client
>>>>*
>>>>@@ -39,11 +48,18 @@ IGT_TEST_DESCRIPTION("Read and verify drm client memory consumption and engine u
>>>>
>>>>#define BO_SIZE (65536)
>>>>
>>>>+/* flag masks */
>>>>+#define TEST_BUSY		(1 << 0)
>>>>+#define TEST_TRAILING_IDLE	(1 << 1)
>>>>+#define TEST_ISOLATION		(1 << 2)
>>>
>>>shoud we document the mapping to the subtests?
>>
>>If you mean something like what the xe_exec_basic (below) is doing, 
>>then I can try that.
>>
>>{ "basic", 0 },
>>{ "basic-defer-mmap", DEFER_ALLOC },
>>{ "basic-defer-bind", DEFER_ALLOC | DEFER_BIND },
>>
>>>
>>>>+
>>>>struct pceu_cycles {
>>>>	uint64_t cycles;
>>>>	uint64_t total_cycles;
>>>>};
>>>>
>>>>+const unsigned long batch_duration_ns = 500e6;
>>>
>>>please use * NSEC_PER_SEC
>>>
>>>>+
>>>>static const char *engine_map[] = {
>>>>	"rcs",
>>>>	"bcs",
>>>>@@ -76,6 +92,27 @@ static void read_engine_cycles(int xe, struct pceu_cycles *pceu)
>>>>		pceu[class].total_cycles = info.total_cycles[class];
>>>>	}
>>>>}
>>>>+
>>>>+/*
>>>>+ * Helper for cases where we assert on time spent sleeping (directly or
>>>>+ * indirectly), so make it more robust by ensuring the system sleep time
>>>>+ * is within test tolerance to start with.
>>>>+ */
>>>>+static unsigned int measured_usleep(unsigned int usec)
>>>>+{
>>>>+	struct timespec ts = { };
>>>>+	unsigned int slept;
>>>>+
>>>>+	slept = igt_nsec_elapsed(&ts);
>>>>+	igt_assert(slept == 0);
>>>>+	do {
>>>>+		usleep(usec - slept);
>>>
>>>mismatch between usec and nsec here in the first sleep. I guess it's
>>>easier to convert the arg to nsec and then always have the same unit
>>>
>>>>+		slept = igt_nsec_elapsed(&ts) / 1000;
>>>
>>>NSEC_PER_USEC
>>>
>>>>+	} while (slept < usec);
>>>>+
>>>>+	return igt_nsec_elapsed(&ts);
>>>
>>>wrong unit? you receive usec as arg and are supposed to return usec, but
>>>rather return nsec.
>>
>>Hmm, there is a lot of unit mismatch in this helper. I think I will 
>>just convert everything here to nsec.
>>
>>>
>>>Also, I'm not sure why exactly this is needed. usleep() from the C
>>>library already guarantees it's going to sleep for *at least* the
>>>specified amount.
>>>
>>>usleep(3):
>>>...
>>>DESCRIPTION
>>>     The usleep() function suspends execution of the calling thread
>>>     for (at least) usec microseconds.  The sleep may be lengthened
>>>     slightly by any system activity or by the time spent processing
>>>     the call or by the granularity of system timers.
>>>
>>>RETURN VALUE
>>>     The usleep() function returns 0 on success.  On error, -1 is
>>>     returned, with errno set to indicate the error.
>>>
>>
>>I think the requirement is to know how long the delay actually was  
>>because the "sleep may be lengthened". All utilization/counter 
>>comparisons will be against the actual delay.
>
>Looks like I am not using the returned value from measured_usleep 
>anyways, so will drop this helper.
>
>Umesh
>
>>
>>>
>>>>+}
>>>>+
>>>>/* Subtests */
>>>>static void test_active(int fd, struct drm_xe_engine *engine)
>>>>{
>>>>@@ -451,6 +488,66 @@ xe_spin_ctx_destroy(int fd, struct xe_spin_ctx *ctx)
>>>>	free(ctx);
>>>>}
>>>>
>>>>+static void
>>>>+check_results(struct pceu_cycles *s1, struct pceu_cycles *s2,
>>>>+	      int class, int width, unsigned int flags)
>>>>+{
>>>>+	double percent;
>>>>+
>>>>+	igt_debug("%s: sample 1: cycles %lu, total_cycles %lu\n",
>>>>+		  engine_map[class], s1[class].cycles, s1[class].total_cycles);
>>>>+	igt_debug("%s: sample 2: cycles %lu, total_cycles %lu\n",
>>>>+		  engine_map[class], s2[class].cycles, s2[class].total_cycles);
>>>>+
>>>>+	percent = ((s2[class].cycles - s1[class].cycles) * 100) /
>>>>+		  ((s2[class].total_cycles + 1) - s1[class].total_cycles);
>>>>+
>>>>+	/* for parallel engines scale the busyness with width */
>>>
>>>s/parallel engines/parallel submission/ ?
>>
>>will change
>>>
>>>>+	percent = percent / width;
>>>>+
>>>>+	igt_debug("%s: percent: %f\n", engine_map[class], percent);
>>>>+
>>>>+	if (flags & TEST_BUSY)
>>>>+		igt_assert(percent >= 95 && percent <= 105);
>>>
>>>percent should never be > 100.
>>
>>Yikes... that's a miss. I did see a couple >=100 though!! and 
>>modified this to check how far it goes above 100, but completely 
>>forgot to debug the issue itself. Will debug.

Update: > 100% looks like a possibility because KMD is doing this:

Sample 1:
Read RING_TIMESTAMP
Read CTX_TIMESTAMP

Sample 2:
Read RING_TIMESTAMP
Read CTX_TIMESTAMP

The delta of CTX_TIMESTAMPS can be larger than the RING_TIMESTAMPS. If 
we take a Sample 3 immediately after sample 2, just to get an updated 
RING_TIMESTAMP, then we can ensure < 100%. Is that acceptable to let 
user handle this or should we ensure this is guaranteed in the KMD?

Thanks,
Umesh

>>
>>>Also, for >= 95 to be true this needs to
>>>be the only client.  Should we check for number of clients and skip if
>>>we find others?
>>
>>Didn't think of that. Wondering if there is a way to enforce a 
>>single client somehow in the test. If not, then I agree to skip if 
>>more clients are present. Any idea how to check if there are other 
>>clients?
>>
>>>
>>>>+	else
>>>>+		igt_assert(!percent);
>>>>+}
>>>>+
>>>>+static void
>>>>+single(int fd, struct drm_xe_engine_class_instance *hwe, int width, int count,
>>>>+       unsigned int flags)
>>>>+{
>>>>+	struct pceu_cycles pceu1[DRM_XE_ENGINE_CLASS_COMPUTE + 1];
>>>>+	struct pceu_cycles pceu2[DRM_XE_ENGINE_CLASS_COMPUTE + 1];
>>>>+	struct xe_spin_ctx *ctx = NULL;
>>>>+	int local_fd = fd;
>>>>+	uint32_t vm;
>>>>+
>>>>+	if (flags & TEST_ISOLATION)
>>>>+		local_fd = drm_reopen_driver(fd);
>>>
>>>so you have 2 clients in the same process... Maybe add this info to the
>>>subtest description above.
>>
>>sure
>>
>>>
>>>>+
>>>>+	vm = xe_vm_create(local_fd, 0, 0);
>>>>+	if (flags & TEST_BUSY) {
>>>>+		ctx = xe_spin_ctx_init(local_fd, hwe, vm, width, count);
>>>>+		xe_spin_sync_start(local_fd, ctx);
>>>>+	}
>>>>+
>>>>+	read_engine_cycles(local_fd, pceu1);
>>>>+	measured_usleep(batch_duration_ns / 1000);
>>>>+	if (flags & TEST_TRAILING_IDLE)
>>>>+		xe_spin_sync_end(local_fd, ctx);
>>>>+	read_engine_cycles(local_fd, pceu2);
>>>>+
>>>>+	check_results(pceu1, pceu2, hwe->engine_class, width, flags);
>>>>+
>>>>+	xe_spin_sync_end(local_fd, ctx);
>>>>+	xe_spin_ctx_destroy(local_fd, ctx);
>>>>+	xe_vm_destroy(local_fd, vm);
>>>>+
>>>>+	if (flags & TEST_ISOLATION)
>>>>+		close(local_fd);
>>>
>>>humn... it doesn't seem this is actually testing any isolation? It's
>>>just reopening the fd and testing the same thing on the duplicated fd.
>>>
>>>what I'd expect of an isolation test is, e.g.:
>>>
>>>/proc/X/fdinfo/Y
>>>	drm-client: A
>>>	drm-rcs-cycles: 0
>>>
>>>when executing on another fd:
>>>
>>>/proc/X/fdinfo/Z
>>>	drm-client: B
>>>	drm-rcs-cyles: 10
>>>
>>>
>>>so, check that read_engine_cycles() returns !0 for fd where you
>>>submitted the spin and 0 for where you didn't.
>>
>>will do
>>
>>Thanks,
>>Umesh
>>>
>>>Lucas De Marchi
>>>
>>>>+}
>>>>+
>>>>igt_main
>>>>{
>>>>	struct drm_xe_engine_class_instance *hwe;
>>>>@@ -485,6 +582,18 @@ igt_main
>>>>	igt_subtest("basic")
>>>>		basic(xe, num_classes);
>>>>
>>>>+	igt_subtest("drm-idle")
>>>>+		xe_for_each_engine(xe, hwe)
>>>>+			single(xe, hwe, 1, 1, 0);
>>>>+
>>>>+	igt_subtest("drm-busy-idle")
>>>>+		xe_for_each_engine(xe, hwe)
>>>>+			single(xe, hwe, 1, 1, TEST_BUSY | TEST_TRAILING_IDLE);
>>>>+
>>>>+	igt_subtest("drm-busy-idle-isolation")
>>>>+		xe_for_each_engine(xe, hwe)
>>>>+			single(xe, hwe, 1, 1, TEST_BUSY | TEST_TRAILING_IDLE | TEST_ISOLATION);
>>>>+
>>>>	igt_describe("Create and compare total and resident memory consumption by client");
>>>>	igt_subtest("drm-total-resident")
>>>>		test_total_resident(xe);
>>>>-- 
>>>>2.34.1
>>>>