[PATCH i-g-t v2 05/10] tests/intel/xe_drm_fdinfo: Add single engine tests
Lucas De Marchi
lucas.demarchi at intel.com
Thu Jul 11 21:34:53 UTC 2024
On Thu, Jul 11, 2024 at 12:13:22PM GMT, Umesh Nerlige Ramappa wrote:
>On Thu, Jul 11, 2024 at 08:20:17AM -0500, Lucas De Marchi wrote:
>>On Tue, Jul 02, 2024 at 05:25:27PM GMT, Umesh Nerlige Ramappa wrote:
>>>Add simple tests that submit work to one engine and measure utilization
>>>per class.
>>>
>>>v2:
>>>- Drop measured_usleep since return value is not used
>>>- s/parallel engines/parallel submission/ in comment
>>>- Use NSEC_PER_SEC for batch_duration_ns
>>>- Percent should not be > 100
>>
>>but as we chatted later in the review, it's actually possible to have it
>>greater than 100, so I think we need to leave some room
>>
>>>- Check utilization for both clients for isolation case
>>>
>>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
>>>---
>>>tests/intel/xe_drm_fdinfo.c | 97 +++++++++++++++++++++++++++++++++++++
>>>1 file changed, 97 insertions(+)
>>>
>>>diff --git a/tests/intel/xe_drm_fdinfo.c b/tests/intel/xe_drm_fdinfo.c
>>>index 037f25e53..410c885e7 100644
>>>--- a/tests/intel/xe_drm_fdinfo.c
>>>+++ b/tests/intel/xe_drm_fdinfo.c
>>>@@ -28,6 +28,15 @@
>>>* SUBTEST: basic-engine-utilization
>>>* Description: Check if basic fdinfo content is present for engine utilization
>>>*
>>>+ * SUBTEST: drm-idle
>>>+ * Description: Check that engines show no load when idle
>>>+ *
>>>+ * SUBTEST: drm-busy-idle
>>>+ * Description: Check that engines show load when idle after busy
>>>+ *
>>>+ * SUBTEST: drm-busy-idle-isolation
>>>+ * Description: Check that engine load does not spill over to other drm clients
>>>+ *
>>>* SUBTEST: drm-total-resident
>>>* Description: Create and compare total and resident memory consumption by client
>>>*
>>>@@ -42,11 +51,18 @@ IGT_TEST_DESCRIPTION("Read and verify drm client memory consumption and engine u
>>>
>>>#define BO_SIZE (65536)
>>>
>>>+/* flag masks */
>>>+#define TEST_BUSY (1 << 0)
>>>+#define TEST_TRAILING_IDLE (1 << 1)
>>>+#define TEST_ISOLATION (1 << 2)
>>>+
>>>struct pceu_cycles {
>>> uint64_t cycles;
>>> uint64_t total_cycles;
>>>};
>>>
>>>+const unsigned long batch_duration_ns = (1 * NSEC_PER_SEC) / 2;
>>>+
>>>static const char *engine_map[] = {
>>> "rcs",
>>> "bcs",
>>>@@ -464,8 +480,77 @@ spin_ctx_destroy(int fd, struct spin_ctx *ctx)
>>> free(ctx);
>>>}
>>>
>>>+static void
>>>+check_results(struct pceu_cycles *s1, struct pceu_cycles *s2,
>>>+ int class, int width, unsigned int flags)
>>>+{
>>>+ double percent;
>>>+
>>>+ igt_debug("%s: sample 1: cycles %lu, total_cycles %lu\n",
>>>+ engine_map[class], s1[class].cycles, s1[class].total_cycles);
>>>+ igt_debug("%s: sample 2: cycles %lu, total_cycles %lu\n",
>>>+ engine_map[class], s2[class].cycles, s2[class].total_cycles);
>>>+
>>>+ percent = ((s2[class].cycles - s1[class].cycles) * 100) /
>>>+ ((s2[class].total_cycles + 1) - s1[class].total_cycles);
>>>+
>>>+ /* for parallel submission scale the busyness with width */
>>>+ percent = percent / width;
>>>+
>>>+ igt_debug("%s: percent: %f\n", engine_map[class], percent);
>>>+
>>>+ if (flags & TEST_BUSY)
>>>+ igt_assert(percent >= 95 && percent <= 100);
>>
>><= 105 like you had before seems good, otherwise we may have too much
>>noise in CI. What are the typical numbers you're seeing?
>
>I see 101 whenever it fails. Also I see it only when I run batches on
>all classes (all_busy_all/most_busy_all etc.). The difference in
>numerator and denominator deltas is close to 5ms though. Not convinced
>that we should see 100 still because we are ending the spinners (and
>waiting for fences and all) before taking sample 2. Still debugging a
>little more on this one.
>
>>
>>>+ else
>>>+ igt_assert(!percent);
>>>+}
>>>+
>>>+static void
>>>+single(int fd, struct drm_xe_engine_class_instance *hwe, int width, int count,
>>>+ unsigned int flags)
>>>+{
>>>+ struct pceu_cycles pceu1[2][DRM_XE_ENGINE_CLASS_COMPUTE + 1];
>>>+ struct pceu_cycles pceu2[2][DRM_XE_ENGINE_CLASS_COMPUTE + 1];
>>
>>nit: either name it pceu_start/pceu_end:
>>
>>pceu_start[0] is the pceu for client0 on start
>>pceu_start[1] is the pceu for client1 on start
>>pceu_end[0] is the pceu for client0 on end
>>pceu_end[1] is the pceu for client1 on end
>>
>>or pceu_client0/pceu_client1 if grouping by client
>>
>>pceu_client0[0] is the pceu for client0 on start
>>pceu_client0[1] is the pceu for client0 on end
>>pceu_client1[1] is the pceu for client1 on start
>>pceu_client1[2] is the pceu for client1 on end
>>
>>... this makes it less prone to typos.
>>
>>But see below as I think 2 samples are not sufficient.
>>
>>>+ struct spin_ctx *ctx = NULL;
>>>+ uint32_t vm;
>>>+ int new_fd;
>>>+
>>>+ if (flags & TEST_ISOLATION)
>>>+ new_fd = drm_reopen_driver(fd);
>>>+
>>>+ vm = xe_vm_create(fd, 0, 0);
>>>+ if (flags & TEST_BUSY) {
>>>+ ctx = spin_ctx_init(fd, hwe, vm, width, count);
>>>+ spin_sync_start(fd, ctx);
>>>+ }
>>>+
>>>+ read_engine_cycles(fd, pceu1[0]);
>>>+ if (flags & TEST_ISOLATION)
>>>+ read_engine_cycles(new_fd, pceu1[1]);
>>>+
>>>+ usleep(batch_duration_ns / 1000);
>>
>>NSEC_PER_USEC
>>
>>>+ if (flags & TEST_TRAILING_IDLE)
>>>+ spin_sync_end(fd, ctx);
>>>+
>>>+ read_engine_cycles(fd, pceu2[0]);
>>
>>... and here we have another source of noise.... when doing
>>TEST_TRAILING_IDLE, should we read the first sample after the spin
>>ended, sleep, and read the second sample?
>
>That would always make utilization = 0% (numerator delta is 0). We
>should take sample 1 when the spinner has started and sample 2 when it
>has ended (for PCEU).
>
>TEST_TRAILING_IDLE is only valid when TEST_BUSY is also specified.
>Maybe I should enforce that in the test. This is the flow for common
>flag configs:
>
>1) flag = TEST_BUSY | TEST_TRAILING_IDLE
>- wait till spinner starts
>- take sample 1
>- usleep
>- end the batch
>- take sample 2.
>- check_results (expected 100% util)
I think the way the test is written, the expectation is wrong.
By absurd:
1) flag = TEST_BUSY | TEST_TRAILING_IDLE
- wait till spinner starts
- take sample 1
- usleep
- end the batch
<--- long preempt here emulated with
usleep(500000)
- take sample 2.
- check_results
So we'd still expect the same ballpark number of delta ticks, but
utilization overall would go down as now you are ending the batch
and sleeping a little more before taking another sample. In the example
above I'd expect utilization to be 50%
>
>2) flag = TEST_BUSY
>- wait till spinner starts
>- take sample 1
>- usleep
>- take sample 2.
>- end the batch
>- check_results (expected 100% util)
>
>3) flag = 0 means truly IDLE since there is no spinner.
>
>>
>>but then how are we testing all at the same time here? I can't see any
>>igt_assert() for the trailing idle test and it will actually impact the
>>busy test since it will wait the spin to end.... Am I missing anything
>>here?
>
>Case (2) above does not exist for PCEU tests because CTX_TIMESTAMP is
>only updates when the context switches out, which means we must end
>the spinner before taking the second sample (or preempt it, but ending
>is simpler). This series only has tests for case (1) and (3)
>implemented.
true, we are only reading the value from the context image, but does the
hardware update it only on context switch? What I had understood was
that the value was updated while the context was running. Otherwise I
think it's utterly broken:
Considering
s* are the samples
c* are the context timestamps
g* are the gpu timestamps
- not running
| running
g0 g1 g2 g3 g4
c0 c1 c2 c3 c4
ctx0 ----||||||||||||||||||||||||||||||||||||-------
^ ^ ^
s0 s1 s3
For simplicity, assuming g0 == c0 == 0
in s0: g1 == 3
c1 == 0
in s1: g2 == 25
c2 == 0
-> utilization returns 0%
in s3: g4 == 40
c4 == c3 == 35
-> utilization returns 35 / (40 - 25) == 233%
how can we get this right if the value is not updated for running
contexts?
Lucas De Marchi
>
>Case (2) is applicable for a completely different feature which is
>still wip - per engine utilization via kernel perf framework.
>
>>I think we need to add a third sample when testing trailing idle
>>and ensure:
>>
>>considering the indexing as sample[client][idx]
>>
>> sample[0][1] - sample[0][0] == 100% if BUSY else 0%
>> sample[0][2] - sample[0][1] == 0% -> idle on trailing test
>>
>>and for isolation:
>>
>> sample[1][1] - sample[0][0] == 0%
>> sample[1][2] - sample[1][1] == 0%
>>
>>>+ if (flags & TEST_ISOLATION)
>>>+ read_engine_cycles(new_fd, pceu2[1]);
>>>+
>>>+ check_results(pceu1[0], pceu2[0], hwe->engine_class, width, flags);
>
>This check_results will check for TEST_BUSY case.
>
>>>+
>>>+ if (flags & TEST_ISOLATION) {
>>>+ check_results(pceu1[1], pceu2[1], hwe->engine_class, width, 0);
>
>This check_results will check for idle case (flags = 0).
>
>>>+ close(new_fd);
>>>+ }
>>>+
>>>+ spin_sync_end(fd, ctx);
>>>+ spin_ctx_destroy(fd, ctx);
>>
>>if !BUSY you are not even calling spin_ctx_init()
>
>:) yeah, sorry about the readability here. The spin_sync_end and
>spin_ctx_destroy bail out if the ctx is NULL which is the case for
>!BUSY. I can add and explicit check to not call these instead.
>
>Thanks,
>Umesh
>>
>>Lucas De Marchi
>>
>>>+ xe_vm_destroy(fd, vm);
>>>+}
>>>+
>>>igt_main
>>>{
>>>+ struct drm_xe_engine_class_instance *hwe;
>>> int xe;
>>>
>>> igt_fixture {
>>>@@ -484,6 +569,18 @@ igt_main
>>> igt_subtest("basic-engine-utilization")
>>> basic_engine_utilization(xe);
>>>
>>>+ igt_subtest("drm-idle")
>>>+ xe_for_each_engine(xe, hwe)
>>>+ single(xe, hwe, 1, 1, 0);
>>>+
>>>+ igt_subtest("drm-busy-idle")
>>>+ xe_for_each_engine(xe, hwe)
>>>+ single(xe, hwe, 1, 1, TEST_BUSY | TEST_TRAILING_IDLE);
>>>+
>>>+ igt_subtest("drm-busy-idle-isolation")
>>>+ xe_for_each_engine(xe, hwe)
>>>+ single(xe, hwe, 1, 1, TEST_BUSY | TEST_TRAILING_IDLE | TEST_ISOLATION);
>>>+
>>> igt_describe("Create and compare total and resident memory consumption by client");
>>> igt_subtest("drm-total-resident")
>>> test_total_resident(xe);
>>>--
>>>2.38.1
>>>
More information about the igt-dev
mailing list