[igt-dev] [PATCH i-g-t] tests/i915/perf: verify reason field in OA reports is never 0
Lionel Landwerlin
lionel.g.landwerlin at intel.com
Wed Nov 25 11:06:54 UTC 2020
On 25/11/2020 12:21, Lionel Landwerlin wrote:
> On 25/11/2020 02:41, Umesh Nerlige Ramappa wrote:
>> On Thu, Nov 19, 2020 at 12:46:30PM +0200, Lionel Landwerlin wrote:
>>> We're about to remove the filtering in i915 on 0 reason fields because
>>> we assume this was a possibility, but it turned out to be a corruption
>>> in the tail pointer register that made us read cleared data.
>>>
>>> This test is here to verify that our assumption hold that the HW never
>>> produces such reports.
>>>
>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>>> ---
>>> tests/i915/perf.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 77 insertions(+)
>>>
>>> diff --git a/tests/i915/perf.c b/tests/i915/perf.c
>>> index caeabd623..08c81c5c2 100644
>>> --- a/tests/i915/perf.c
>>> +++ b/tests/i915/perf.c
>>> @@ -2539,6 +2539,76 @@ test_buffer_fill(void)
>>> __perf_close(stream_fd);
>>> }
>>
>> I think this logic of reading all the reports and doing a sanity
>> check could be reused for some other tests. That could help us
>> refactor the perf OA test code. I queues something similar to trybot.
>> https://patchwork.freedesktop.org/series/84231/
>>
>> If you feel we can go ahead with that ^, I will post it here. If not,
>> here are the comments.
>
>
> Fair point :)
>
>
>>
>>>
>>> +static void
>>> +test_non_zero_reason(void)
>>> +{
>>> + /* ~10 micro second period */
>>> + int oa_exponent = max_oa_exponent_for_period_lte(10000);
>>> + uint64_t properties[] = {
>>> + /* Include OA reports in samples */
>>> + DRM_I915_PERF_PROP_SAMPLE_OA, true,
>>> +
>>> + /* OA unit configuration */
>>> + DRM_I915_PERF_PROP_OA_METRICS_SET,
>>> test_set->perf_oa_metrics_set,
>>> + DRM_I915_PERF_PROP_OA_FORMAT, test_set->perf_oa_format,
>>> + DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
>>> + };
>>> + struct drm_i915_perf_open_param param = {
>>> + .flags = I915_PERF_FLAG_FD_CLOEXEC,
>>> + .num_properties = sizeof(properties) / 16,
>>> + .properties_ptr = to_user_pointer(properties),
>>> + };
>>> + struct drm_i915_perf_record_header *header;
>>> + uint32_t buf_size = 5 * 65536 * (256 + sizeof(struct
>>> drm_i915_perf_record_header));
>>
>> 3 times instead of 5 could be good enough. Thoughts?
>
>
> I mostly looked at time budget
>
>
>>
>>> + uint8_t *buf = malloc(buf_size);
>>> + uint32_t total_len = 0;
>>> + int len;
>>
>> igt_assert(buf);
>>
>> just in case we didn't get the required memory.
>
>
> Done.
>
>
>>
>>> +
>>> + igt_debug("Ready to read about %u bytes\n", buf_size);
>>> +
>>> + load_helper_init();
>>> + load_helper_run(HIGH);
>>> +
>>> + stream_fd = __perf_open(drm_fd, ¶m, true /* prevent_pm */);
>>> +
>>> + while (total_len < (buf_size - sizeof(struct
>>> drm_i915_perf_record_header)) &&
>>> + ((len = read(stream_fd, &buf[total_len], buf_size -
>>> total_len)) > 0 ||
>>> + (len -1 && errno == EINTR))) {
>>
>> Did you mean?
>>
>> (len == -1 && errno == EINTR)
>
>
> Oh dear... Thanks!
>
>
>>
>>> + if (len > 0)
>>> + total_len += len;
>>> + }
>>> +
>>> + __perf_close(stream_fd);
>>> +
>>> + load_helper_stop();
>>> + load_helper_fini();
>>> +
>>> + igt_debug("Got %u bytes\n", total_len);
>>> +
>>> + for (uint32_t offset = 0; offset < total_len; offset +=
>>> header->size) {
>>> + header = (void *) (buf + offset);
>>> +
>>> + switch (header->type) {
>>> + case DRM_I915_PERF_RECORD_OA_REPORT_LOST:
>> Based on some trial runs from my test, I see 40 us or 100 us
>> oa_exponent would likely result in no reports lost. On the other
>> hand, setting the HEAD/TAIL wrap bits in OASTATUS likely happens only
>> when REPORT LOST occurs. Should we check that at least one report was
>> lost here?
>>
>> reports_lost++;
>
>
> I see a lot of report lost actually, even with ~1ms sampling.
>
> As far as I can tell, we really can't predict when this is going to
> happen so I rather not rely on this.
>
>
>>
>>> + break;
>>> + case DRM_I915_PERF_RECORD_SAMPLE: {
>>> + const uint32_t *report = (void *) (header + 1);
>>> + uint32_t reason = (report[0] >> OAREPORT_REASON_SHIFT) &
>>> + OAREPORT_REASON_MASK;
>>> +
>>> + igt_assert_neq(reason, 0);
>>
>> why not sanity_check_reports?
>
>
> Hey, this is raising an load of interesting question... Lots of
> failures...
So it seems we might need to backport the register reset kernel patch,
because we actually report invalid data to userspace right now...
>
>
>>
>> Thanks,
>> Umesh
>>
>>> + break;
>>> + }
>>> + case DRM_I915_PERF_RECORD_OA_BUFFER_LOST:
>>> + igt_assert(!"unexpected overflow");
>>> + break;
>>> + }
>>> + }
>>> +
>>> +
>>> + free(buf);
>>> +}
>>> +
>>> static void
>>> test_enable_disable(void)
>>> {
>>> @@ -4882,6 +4952,13 @@ igt_main
>>> igt_subtest("buffer-fill")
>>> test_buffer_fill();
>>>
>>> + igt_describe("Test that reason field in OA reports is never 0
>>> on Gen8+");
>>> + igt_subtest("non-zero-reason") {
>>> + /* Reason field is only available on Gen8+ */
>>> + igt_require(intel_gen(devid) >= 8);
>>> + test_non_zero_reason();
>>> + }
>>> +
>>> igt_subtest("disabled-read-error")
>>> test_disabled_read_error();
>>> igt_subtest("non-sampling-read-error")
>>> --
>>> 2.29.2
>>>
>
More information about the igt-dev
mailing list