[igt-dev] [PATCH i-g-t] tests/i915/perf: verify reason field in OA reports is never 0

Lionel Landwerlin lionel.g.landwerlin at intel.com
Wed Nov 25 11:06:54 UTC 2020


On 25/11/2020 12:21, Lionel Landwerlin wrote:
> On 25/11/2020 02:41, Umesh Nerlige Ramappa wrote:
>> On Thu, Nov 19, 2020 at 12:46:30PM +0200, Lionel Landwerlin wrote:
>>> We're about to remove the filtering in i915 on 0 reason fields because
>>> we assume this was a possibility, but it turned out to be a corruption
>>> in the tail pointer register that made us read cleared data.
>>>
>>> This test is here to verify that our assumption hold that the HW never
>>> produces such reports.
>>>
>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>>> ---
>>> tests/i915/perf.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 77 insertions(+)
>>>
>>> diff --git a/tests/i915/perf.c b/tests/i915/perf.c
>>> index caeabd623..08c81c5c2 100644
>>> --- a/tests/i915/perf.c
>>> +++ b/tests/i915/perf.c
>>> @@ -2539,6 +2539,76 @@ test_buffer_fill(void)
>>>     __perf_close(stream_fd);
>>> }
>>
>> I think this logic of reading all the reports and doing a sanity 
>> check could be reused for some other tests. That could help us 
>> refactor the perf OA test code. I queues something similar to trybot.
>> https://patchwork.freedesktop.org/series/84231/
>>
>> If you feel we can go ahead with that ^, I will post it here. If not, 
>> here are the comments.
>
>
> Fair point :)
>
>
>>
>>>
>>> +static void
>>> +test_non_zero_reason(void)
>>> +{
>>> +    /* ~10 micro second period */
>>> +    int oa_exponent = max_oa_exponent_for_period_lte(10000);
>>> +    uint64_t properties[] = {
>>> +        /* Include OA reports in samples */
>>> +        DRM_I915_PERF_PROP_SAMPLE_OA, true,
>>> +
>>> +        /* OA unit configuration */
>>> +        DRM_I915_PERF_PROP_OA_METRICS_SET, 
>>> test_set->perf_oa_metrics_set,
>>> +        DRM_I915_PERF_PROP_OA_FORMAT, test_set->perf_oa_format,
>>> +        DRM_I915_PERF_PROP_OA_EXPONENT, oa_exponent,
>>> +    };
>>> +    struct drm_i915_perf_open_param param = {
>>> +        .flags = I915_PERF_FLAG_FD_CLOEXEC,
>>> +        .num_properties = sizeof(properties) / 16,
>>> +        .properties_ptr = to_user_pointer(properties),
>>> +    };
>>> +    struct drm_i915_perf_record_header *header;
>>> +    uint32_t buf_size = 5 * 65536 * (256 + sizeof(struct 
>>> drm_i915_perf_record_header));
>>
>> 3 times instead of 5 could be good enough. Thoughts?
>
>
> I mostly looked at time budget
>
>
>>
>>> +    uint8_t *buf = malloc(buf_size);
>>> +    uint32_t total_len = 0;
>>> +    int len;
>>
>> igt_assert(buf);
>>
>> just in case we didn't get the required memory.
>
>
> Done.
>
>
>>
>>> +
>>> +    igt_debug("Ready to read about %u bytes\n", buf_size);
>>> +
>>> +    load_helper_init();
>>> +    load_helper_run(HIGH);
>>> +
>>> +    stream_fd = __perf_open(drm_fd, &param, true /* prevent_pm */);
>>> +
>>> +    while (total_len < (buf_size - sizeof(struct 
>>> drm_i915_perf_record_header)) &&
>>> +           ((len = read(stream_fd, &buf[total_len], buf_size - 
>>> total_len)) > 0 ||
>>> +        (len -1 && errno == EINTR))) {
>>
>> Did you mean?
>>
>> (len == -1 && errno == EINTR)
>
>
> Oh dear... Thanks!
>
>
>>
>>> +        if (len > 0)
>>> +            total_len += len;
>>> +    }
>>> +
>>> +    __perf_close(stream_fd);
>>> +
>>> +    load_helper_stop();
>>> +    load_helper_fini();
>>> +
>>> +    igt_debug("Got %u bytes\n", total_len);
>>> +
>>> +    for (uint32_t offset = 0; offset < total_len; offset += 
>>> header->size) {
>>> +        header = (void *) (buf + offset);
>>> +
>>> +        switch (header->type) {
>>> +        case DRM_I915_PERF_RECORD_OA_REPORT_LOST:
>> Based on some trial runs from my test, I see 40 us or 100 us 
>> oa_exponent would likely result in no reports lost. On the other 
>> hand, setting the HEAD/TAIL wrap bits in OASTATUS likely happens only 
>> when REPORT LOST occurs. Should we check that at least one report was 
>> lost here?
>>
>>             reports_lost++;
>
>
> I see a lot of report lost actually, even with ~1ms sampling.
>
> As far as I can tell, we really can't predict when this is going to 
> happen so I rather not rely on this.
>
>
>>
>>> +            break;
>>> +        case DRM_I915_PERF_RECORD_SAMPLE: {
>>> +            const uint32_t *report = (void *) (header + 1);
>>> +            uint32_t reason = (report[0] >> OAREPORT_REASON_SHIFT) &
>>> +                OAREPORT_REASON_MASK;
>>> +
>>> +            igt_assert_neq(reason, 0);
>>
>> why not sanity_check_reports?
>
>
> Hey, this is raising an load of interesting question... Lots of 
> failures...


So it seems we might need to backport the register reset kernel patch, 
because we actually report invalid data to userspace right now...


>
>
>>
>> Thanks,
>> Umesh
>>
>>> +            break;
>>> +        }
>>> +        case DRM_I915_PERF_RECORD_OA_BUFFER_LOST:
>>> +            igt_assert(!"unexpected overflow");
>>> +            break;
>>> +        }
>>> +    }
>>> +
>>> +
>>> +    free(buf);
>>> +}
>>> +
>>> static void
>>> test_enable_disable(void)
>>> {
>>> @@ -4882,6 +4952,13 @@ igt_main
>>>     igt_subtest("buffer-fill")
>>>         test_buffer_fill();
>>>
>>> +    igt_describe("Test that reason field in OA reports is never 0 
>>> on Gen8+");
>>> +    igt_subtest("non-zero-reason") {
>>> +        /* Reason field is only available on Gen8+ */
>>> +        igt_require(intel_gen(devid) >= 8);
>>> +        test_non_zero_reason();
>>> +    }
>>> +
>>>     igt_subtest("disabled-read-error")
>>>         test_disabled_read_error();
>>>     igt_subtest("non-sampling-read-error")
>>> -- 
>>> 2.29.2
>>>
>



More information about the igt-dev mailing list