[PATCH 06/13] tests/intel/xe_oa: Rewrite the polling small buf test
Dixit, Ashutosh
ashutosh.dixit at intel.com
Thu Feb 27 03:45:16 UTC 2025
On Tue, 25 Feb 2025 14:51:39 -0800, Umesh Nerlige Ramappa wrote:
>
Hi Umesh,
> On Mon, Feb 24, 2025 at 08:26:56PM -0800, Dixit, Ashutosh wrote:
> > On Mon, 24 Feb 2025 14:56:05 -0800, Umesh Nerlige Ramappa wrote:
> >>
> >> On Mon, Feb 24, 2025 at 12:11:37PM -0800, Dixit, Ashutosh wrote:
> >> > On Tue, 18 Feb 2025 12:28:04 -0800, Umesh Nerlige Ramappa wrote:
> >> >>
> >> >
> >> >> Use mmio reads as a side-channel to determine if reports are available
> >> >> and ensure that poll will return with POLLIN set. Then provide a small
> >> >> buffer to force ENOSPC error. Then poll with a timeout of 0 to check if
> >> >> POLLIN is still set.
> >> >
> >> > Will need a reason for doing this here. But see below.
> >> >
> >> >>
> >> >> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
> >> >> ---
> >> >> tests/intel/xe_oa.c | 64 +++++++++++++++++++++++++--------------------
> >> >> 1 file changed, 35 insertions(+), 29 deletions(-)
> >> >>
> >> >> diff --git a/tests/intel/xe_oa.c b/tests/intel/xe_oa.c
> >> >> index aaf92308a..5792ffec2 100644
> >> >> --- a/tests/intel/xe_oa.c
> >> >> +++ b/tests/intel/xe_oa.c
> >> >> @@ -2216,7 +2216,6 @@ static void test_polling(uint64_t requested_oa_period,
> >> >> */
> >> >> static void test_polling_small_buf(void)
> >> >> {
> >> >> - int oa_exponent = max_oa_exponent_for_period_lte(40 * 1000); /* 40us */
> >> >> uint64_t properties[] = {
> >> >> DRM_XE_OA_PROPERTY_OA_UNIT_ID, 0,
> >> >>
> >> >> @@ -2226,50 +2225,57 @@ static void test_polling_small_buf(void)
> >> >> /* OA unit configuration */
> >> >> DRM_XE_OA_PROPERTY_OA_METRIC_SET, default_test_set->perf_oa_metrics_set,
> >> >> DRM_XE_OA_PROPERTY_OA_FORMAT, __ff(default_test_set->perf_oa_format),
> >> >> - DRM_XE_OA_PROPERTY_OA_PERIOD_EXPONENT, oa_exponent,
> >> >> + DRM_XE_OA_PROPERTY_OA_PERIOD_EXPONENT, oa_exponent_default,
> >> >> DRM_XE_OA_PROPERTY_OA_DISABLED, true,
> >> >> };
> >> >> struct intel_xe_oa_open_prop param = {
> >> >> .num_properties = ARRAY_SIZE(properties) / 2,
> >> >> .properties_ptr = to_user_pointer(properties),
> >> >> };
> >> >> - uint32_t test_duration = 80 * 1000 * 1000;
> >> >> - int sample_size = get_oa_format(default_test_set->perf_oa_format).size;
> >> >> - int n_expected_reports = test_duration / oa_exponent_to_ns(oa_exponent);
> >> >> - int n_expect_read_bytes = n_expected_reports * sample_size;
> >> >> - struct timespec ts = {};
> >> >> - int n_bytes_read = 0;
> >> >> - uint32_t n_polls = 0;
> >> >> + int report_size = get_oa_format(default_test_set->perf_oa_format).size;
> >> >> + u32 oa_tail, prev_tail;
> >> >> + struct pollfd pollfd;
> >> >> + uint8_t buf[10];
> >> >> + int ret, i = 0;
> >> >> +
> >> >> + intel_register_access_init(&mmio_data,
> >> >> + igt_device_get_pci_device(drm_fd), 0);
> >> >>
> >> >> stream_fd = __perf_open(drm_fd, ¶m, true /* prevent_pm */);
> >> >> set_fd_flags(stream_fd, O_CLOEXEC | O_NONBLOCK);
> >> >> - do_ioctl(stream_fd, DRM_XE_OBSERVATION_IOCTL_ENABLE, 0);
> >> >> -
> >> >> - while (igt_nsec_elapsed(&ts) < test_duration) {
> >> >> - struct pollfd pollfd = { .fd = stream_fd, .events = POLLIN };
> >> >>
> >> >> - ppoll(&pollfd, 1, NULL, NULL);
> >> >> - if (pollfd.revents & POLLIN) {
> >> >> - uint8_t buf[1024];
> >> >> - int ret;
> >> >> +#define OAG_OATAILPTR (0xdb04)
> >> >> + /* Save the current tail */
> >> >> + prev_tail = oa_tail = intel_register_read(&mmio_data, OAG_OATAILPTR);
> >> >>
> >> >> - ret = read(stream_fd, buf, sizeof(buf));
> >> >> - if (ret >= 0)
> >> >> - n_bytes_read += ret;
> >> >> - }
> >> >> + /* Kickstart the capture */
> >> >> + do_ioctl(stream_fd, DRM_XE_OBSERVATION_IOCTL_ENABLE, 0);
> >> >>
> >> >> - n_polls++;
> >> >> + /* Wait for 5 reports */
> >> >
> >> > Wait for 5 reports or 10 ms ?
> >> >
> >> >
> >> >> + while ((oa_tail - prev_tail) < (5 * report_size)) {
> >> >> + usleep(1000);
> >> >> + oa_tail = intel_register_read(&mmio_data, OAG_OATAILPTR);
> >> >> + if (i++ > 10)
> >> >
> >> > So on slow platforms we might not get any reports in 10 ms? The idea here
> >> > should be to not have any timing dependence? So if we want to wait for 5
> >> > reports, just wait for 5 reports?
> >>
> >> Oh, I think the loop was stuck while debugging something, so had added a
> >> counter to bail out in 10 iterations. I will remove that. We only need to
> >> wait for 5 reports.
> >
> > Ok.
> >
> >>
> >> >
> >> > We tried doing this for the mmap OA buffer: see
> >> > mmap_wait_for_periodic_reports(), the function waits indefinitely.
> >>
> >> You mean this:
> >>
> >> while (num_periodic_reports < n) {
> >> usleep(4 * n * period_us);
> >> num_periodic_reports = 0;
> >> for (reports = (uint32_t *)oa_vaddr;
> >> reports[0] && oa_timestamp(reports, fmt) && oa_report_is_periodic(reports);
> >> reports += get_oa_format(fmt).size) {
> >> num_periodic_reports++;
> >> }
> >> }
> >>
> >> Well.. if your reports start coming in fast enough, then you would just
> >> spin in the inner for loop. Maybe break the inner for loop when
> >> num_periodic_reports >= n;
> >
> > Ok, but I was just pointing out that this sort of code is beginning to get
> > in. Sai Teja did this.
> >
> >>
> >> >
> >> > So if this is done I am not sure if the intel_register_read() approach is
> >> > needed (but I didn't think of doing that :). But I guess we can use it to
> >> > see when there are N reports available.
> >> >
> >> > Longer term it would be nice to have a centralized function
> >> > wait_for_n_reports(int n) or something like that which different tests can
> >> > use.
> >>
> >> Agree, except that some tests will read the actual reports, while others
> >> just want to take a peek at how many reports are available without reading
> >> them. Since mmap is also a feature under test, I took the easier
> >> approach. We can always refine it if we find something better.
> >
> > Can have similar peek_for_n_reports(n), which would have to use your
> > register read method.
> >
> >>
> >> >
> >> >> + break;
> >> >> }
> >> >>
> >> >> - igt_info("Read %d expected %d (%.2f%% of the expected number), polls=%u\n",
> >> >> - n_bytes_read, n_expect_read_bytes,
> >> >> - n_bytes_read * 100.0f / n_expect_read_bytes,
> >> >> - n_polls);
> >> >> + intel_register_access_fini(&mmio_data);
> >> >>
> >> >> - __perf_close(stream_fd);
> >> >> + /* Just read one report and expect ENOSPC */
> >> >> + pollfd.fd = stream_fd;
> >> >> + pollfd.events = POLLIN;
> >> >> + poll(&pollfd, 1, 1000);
> >> >> + igt_assert(pollfd.revents & POLLIN);
> >> >
> >> > Is the assumption here that the kernel timer is firing every 5 ms (so if
> >> > we've waited for 10 ms POLLIN must be set since the timer is firing every 5
> >> > ms)? I am not sure if that 5 ms is uapi. Or is it? Actually I was thinking
> >> > of changing that 5 ms time or changing the timer to a delayed work.
> >>
> >> But here I am waiting 1000ms in the poll above. That should be sufficient
> >> for POLLIN to be set. If not, we could set the timeout to a large value (a
> >> few seconds).
> >
> > Oh ok, sorry I missed the 1000 ms.
> >
> >>
> >> >
> >> >> + errno = 0;
> >> >> + ret = read(stream_fd, buf, sizeof(buf));
> >> >> + igt_assert_eq(ret, -1);
> >> >> + igt_assert_eq(errno, ENOSPC);
> >> >
> >> > This part looks ok, it's uapi.
> >> >
> >>
> >> Note:
> >> ENOSPC is returned only if the buffer is small enough that not even one
> >> report will fit in. Initially I had a 600 byte buffer, but I did not get
> >> ENOSPC. Instead I got 576 in ret which I think is the correct behavior.
> >
> > OK.
> >
> >> >>
> >> >> - igt_assert(abs(n_expect_read_bytes - n_bytes_read) <
> >> >> - 0.20 * n_expect_read_bytes);
> >> >> + /* Poll with 0 timeout and expect POLLIN flag to be set */
> >> >> + poll(&pollfd, 1, 0);
> >> >> + igt_assert(pollfd.revents & POLLIN);
> >> >> +
> >> >> + __perf_close(stream_fd);
> >> >
> >> > How about just reading N reports using a small buffer for this test,
> >> > however long it takes? N can 5 or 10.
> >>
> >> Not sure I understand. You mean at this stage of the test, read 5/10
> >> reports? OR just alter the entire test somehow to do something different?
> >>
> >> I thought the test was specifically testing that POLLIN is still set after
> >> an ENOSPC error, so I have written it for that case alone. The 0 timeout
> >> will bypass the wait in the poll so that we only get the state of POLLIN.
> >
> > The original test, as I see it, seems to be just doing a lot of regular
> > reads with a small buffer (1K instead of 16 MB say).
> >
> > But what you are doing here is also probably ok/better than the
> > original. Let me think some more about it.
>
> Just a heads up, I think prev_tail should be read after oa_stream is
> enabled, because BUFFER/TAIL/HEAD is configured on enable. Maybe I will
> read the prev_tail after enabling OA or something like that.
>
> In the long run, I might just use the whitelisted TAIL reg to do that, so
> that it's part of the uapi and will hopefully work the way I want it
> to. Right now taking the path of least resistance..
Let's go with this approach, I think it's pretty good. We can see later if
we can think of anything better.
Also, as we were discussing, maybe instead of using mmio read's to detect
presence of data, we can just use DRM_XE_OA_PROPERTY_WAIT_NUM_REPORTS?
Thanks.
--
Ashutosh
More information about the igt-dev
mailing list