[Mesa-dev] [PATCH] i965: oa-counters: keep on reading reports until delimiting timestamp

Robert Bragg robert at sixbynine.org
Tue Apr 4 17:45:08 UTC 2017


On Tue, Apr 4, 2017 at 10:20 AM, Lionel Landwerlin <
lionel.g.landwerlin at intel.com> wrote:

> On 03/04/17 21:04, Robert Bragg wrote:
>
>
>
> On Mar 30, 2017 16:16, "Lionel Landwerlin" <lionel.g.landwerlin at intel.com>
> wrote:
>
> While exercising reading report with moderate load, we might have to
> wait for all the reports to land in the OA buffer, otherwise we might
> miss some reports. That means we need to keep on reading the OA stream
> until the last report we read has a timestamp older that the timestamp
> recorded by the MI_REPORT_PERF_COUNT at the end of the performance
> query.
>
>
> The expectation is that since we have received the second
> MI_REPORT_PERF_COUNT report that implies any periodic/automatic reports
> that the could possibly relate to a query must have been written to the
> OABUFFER. Since we read() as much as is available from i915 perf when we
> come to finish processing a query then we shouldn't miss anything.
>
> We shouldn't synchronously wait in a busy loop until we've found a report
> that has a timestamp >= our end MI_RPC because that could be a really
> significant amount of time (e.g. 50+ milliseconds on HSW). NB. the periodic
> samples are just to account for counter overflow. On Gen8+ the loop would
> likely block until the next context switch happens.
>
>
> Right, so if we don't get all the reports until the second
> MI_REPORT_PERF_COUNT, we might be dealing with a kernel issue.
>

Okey, so the realisation we've come to talking about this offline is that
the kernel's handling of the OA unit's tail pointer race condition breaks
this assumption :-/

Even though any necessary reports must have landed in the OABUFFER, the
kernel will throttle/delay access to the most recent reports in case the HW
tail pointer has got ahead of what's visible to the CPU in memory.

To handle this without blocking in a busy loop as above I guess we probably
need to drive our i915 perf reads() when the application asks if a query's
results are ready via brw_is_perf_query_ready().

It also isn't enough for brw_wait_perf_query to rely on
drm_intel_bo_wait_rendering() with the MI_RPC bo.

We might want to bump the periodic sampling frequency to a small multiple
of 5 milliseconds to at least reduce the worst-case latency for results
becoming ready. It wouldn't be that worthwhile sampling with a higher
frequency since the kernel only does tail pointer age checks at 200Hz
currently.


>
> I'm still suspicious of the code in mesa. I think in some cases (in
> particular with many applications running) it might loop forever.
>

Hope it's ok, but maybe there's an issue, can you maybe elaborate on what
details looks suspicious to you?


>
> Also the igt tests using roughly the same loop while (read() && errno !=
> EINTR) are pretty reliable at not finding all the reports we need from to
> the stream.
> So I would expect the same thing from mesa.
>

Right, I wrote the igt code first and when that looked to be working I
ported it to Mesa.

The issue with the tail pointer race handling by the kernel sounds like a
likely explanation for seeing issues here in both mesa and igt.



>
>
>
> Was this proposal based on a code inspection or did you maybe hit a
> problem correctly measuring something?
>
> Maybe if based on inspection we can find somewhere to put a comment to
> clarify the assumptions above?
>
> Br,
> - Robert
>
>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
> Cc: Robert Bragg <robert at sixbynine.org>
> ---
>  src/mesa/drivers/dri/i965/brw_performance_query.c | 51
> ++++++++++++++---------
>  1 file changed, 32 insertions(+), 19 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.c
> b/src/mesa/drivers/dri/i965/brw_performance_query.c
> index 4a94e4b3cc..076c59e633 100644
> --- a/src/mesa/drivers/dri/i965/brw_performance_query.c
> +++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
> @@ -647,38 +647,50 @@ discard_all_queries(struct brw_context *brw)
>  }
>
>  static bool
> -read_oa_samples(struct brw_context *brw)
> +read_oa_samples_until(struct brw_context *brw,
> +                      uint32_t start_timestamp,
> +                      uint32_t end_timestamp)
>  {
> -   while (1) {
> +   uint32_t last_timestamp = start_timestamp;
> +
> +   while (last_timestamp < end_timestamp) {
>        struct brw_oa_sample_buf *buf = get_free_sample_buf(brw);
> +      uint32_t offset;
>        int len;
>
>        while ((len = read(brw->perfquery.oa_stream_fd, buf->buf,
> -                         sizeof(buf->buf))) < 0 && errno == EINTR)
> +                         sizeof(buf->buf))) < 0 &&
> +             (errno == EINTR || errno == EAGAIN))
>           ;
>
>        if (len <= 0) {
>           exec_list_push_tail(&brw->perfquery.free_sample_buffers,
> &buf->link);
>
> -         if (len < 0) {
> -            if (errno == EAGAIN)
> -               return true;
> -            else {
> -               DBG("Error reading i915 perf samples: %m\n");
> -               return false;
> -            }
> -         } else {
> +         if (len < 0)
> +            DBG("Error reading i915 perf samples: %m\n");
> +         else
>              DBG("Spurious EOF reading i915 perf samples\n");
> -            return false;
> -         }
> +
> +         return false;
>        }
>
>        buf->len = len;
>        exec_list_push_tail(&brw->perfquery.sample_buffers, &buf->link);
> +
> +      /* Go through the reports and update the last timestamp. */
> +      while (offset < buf->len) {
> +         const struct drm_i915_perf_record_header *header =
> +            (const struct drm_i915_perf_record_header *)
> &buf->buf[offset];
> +         uint32_t *report = (uint32_t *) (header + 1);
> +
> +         if (header->type == DRM_I915_PERF_RECORD_SAMPLE)
> +            last_timestamp = report[1];
> +
> +         offset += header->size;
> +      }
>     }
>
> -   unreachable("not reached");
> -   return false;
> +   return true;
>  }
>
>  /**
> @@ -709,10 +721,6 @@ accumulate_oa_reports(struct brw_context *brw,
>
>     assert(o->Ready);
>
> -   /* Collect the latest periodic OA reports from i915 perf */
> -   if (!read_oa_samples(brw))
> -      goto error;
> -
>     drm_intel_bo_map(obj->oa.bo, false);
>     query_buffer = obj->oa.bo->virtual;
>
> @@ -728,6 +736,11 @@ accumulate_oa_reports(struct brw_context *brw,
>        goto error;
>     }
>
> +   /* Collect the latest periodic OA reports from i915 perf */
> +   if (!read_oa_samples_until(brw, start[1], end[1]))
> +      goto error;
> +
> +
>     /* See if we have any periodic reports to accumulate too... */
>
>     /* N.B. The oa.samples_head was set when the query began and
> --
> 2.11.0
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170404/469cc340/attachment-0001.html>


More information about the mesa-dev mailing list