[Mesa-dev] [PATCH] i965: oa-counters: keep on reading reports until delimiting timestamp

Lionel Landwerlin lionel.g.landwerlin at intel.com
Wed Apr 5 21:42:56 UTC 2017


On 04/04/17 18:45, Robert Bragg wrote:
>
> On Tue, Apr 4, 2017 at 10:20 AM, Lionel Landwerlin 
> <lionel.g.landwerlin at intel.com <mailto:lionel.g.landwerlin at intel.com>> 
> wrote:
>
>     On 03/04/17 21:04, Robert Bragg wrote:
>>
>>
>>     On Mar 30, 2017 16:16, "Lionel Landwerlin"
>>     <lionel.g.landwerlin at intel.com
>>     <mailto:lionel.g.landwerlin at intel.com>> wrote:
>>
>>         While exercising reading report with moderate load, we might
>>         have to
>>         wait for all the reports to land in the OA buffer, otherwise
>>         we might
>>         miss some reports. That means we need to keep on reading the
>>         OA stream
>>         until the last report we read has a timestamp older that the
>>         timestamp
>>         recorded by the MI_REPORT_PERF_COUNT at the end of the
>>         performance
>>         query.
>>
>>
>>     The expectation is that since we have received the second
>>     MI_REPORT_PERF_COUNT report that implies any periodic/automatic
>>     reports that the could possibly relate to a query must have been
>>     written to the OABUFFER. Since we read() as much as is available
>>     from i915 perf when we come to finish processing a query then we
>>     shouldn't miss anything.
>>
>>     We shouldn't synchronously wait in a busy loop until we've found
>>     a report that has a timestamp >= our end MI_RPC because that
>>     could be a really significant amount of time (e.g. 50+
>>     milliseconds on HSW). NB. the periodic samples are just to
>>     account for counter overflow. On Gen8+ the loop would likely
>>     block until the next context switch happens.
>
>     Right, so if we don't get all the reports until the second
>     MI_REPORT_PERF_COUNT, we might be dealing with a kernel issue.
>
>
> Okey, so the realisation we've come to talking about this offline is 
> that the kernel's handling of the OA unit's tail pointer race 
> condition breaks this assumption :-/
>
> Even though any necessary reports must have landed in the OABUFFER, 
> the kernel will throttle/delay access to the most recent reports in 
> case the HW tail pointer has got ahead of what's visible to the CPU in 
> memory.
>
> To handle this without blocking in a busy loop as above I guess we 
> probably need to drive our i915 perf reads() when the application asks 
> if a query's results are ready via brw_is_perf_query_ready().
>
> It also isn't enough for brw_wait_perf_query to rely on 
> drm_intel_bo_wait_rendering() with the MI_RPC bo.
>
> We might want to bump the periodic sampling frequency to a small 
> multiple of 5 milliseconds to at least reduce the worst-case latency 
> for results becoming ready. It wouldn't be that worthwhile sampling 
> with a higher frequency since the kernel only does tail pointer age 
> checks at 200Hz currently.
>
>
>     I'm still suspicious of the code in mesa. I think in some cases
>     (in particular with many applications running) it might loop forever.
>
>
> Hope it's ok, but maybe there's an issue, can you maybe elaborate on 
> what details looks suspicious to you?

Missed your question :/

Basically we might end up reading forever if too many context switches 
happen.
(experienced with igt with a bunch of other applications using the GPU)

It feels like another good argument for checking the timestamps.

>
>
>     Also the igt tests using roughly the same loop while (read() &&
>     errno != EINTR) are pretty reliable at not finding all the reports
>     we need from to the stream.
>     So I would expect the same thing from mesa.
>
>
> Right, I wrote the igt code first and when that looked to be working I 
> ported it to Mesa.
>
> The issue with the tail pointer race handling by the kernel sounds 
> like a likely explanation for seeing issues here in both mesa and igt.
>
>
>
>>
>>     Was this proposal based on a code inspection or did you maybe hit
>>     a problem correctly measuring something?
>>
>>     Maybe if based on inspection we can find somewhere to put a
>>     comment to clarify the assumptions above?
>>
>>     Br,
>>     - Robert
>>
>>
>>         Signed-off-by: Lionel Landwerlin
>>         <lionel.g.landwerlin at intel.com
>>         <mailto:lionel.g.landwerlin at intel.com>>
>>         Cc: Robert Bragg <robert at sixbynine.org
>>         <mailto:robert at sixbynine.org>>
>>         ---
>>          src/mesa/drivers/dri/i965/brw_performance_query.c | 51
>>         ++++++++++++++---------
>>          1 file changed, 32 insertions(+), 19 deletions(-)
>>
>>         diff --git
>>         a/src/mesa/drivers/dri/i965/brw_performance_query.c
>>         b/src/mesa/drivers/dri/i965/brw_performance_query.c
>>         index 4a94e4b3cc..076c59e633 100644
>>         --- a/src/mesa/drivers/dri/i965/brw_performance_query.c
>>         +++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
>>         @@ -647,38 +647,50 @@ discard_all_queries(struct brw_context
>>         *brw)
>>          }
>>
>>          static bool
>>         -read_oa_samples(struct brw_context *brw)
>>         +read_oa_samples_until(struct brw_context *brw,
>>         +                      uint32_t start_timestamp,
>>         +                      uint32_t end_timestamp)
>>          {
>>         -   while (1) {
>>         +   uint32_t last_timestamp = start_timestamp;
>>         +
>>         +   while (last_timestamp < end_timestamp) {
>>                struct brw_oa_sample_buf *buf = get_free_sample_buf(brw);
>>         +      uint32_t offset;
>>                int len;
>>
>>                while ((len = read(brw->perfquery.oa_stream_fd, buf->buf,
>>         -  sizeof(buf->buf))) < 0 && errno == EINTR)
>>         +  sizeof(buf->buf))) < 0 &&
>>         +             (errno == EINTR || errno == EAGAIN))
>>                   ;
>>
>>                if (len <= 0) {
>>         exec_list_push_tail(&brw->perfquery.free_sample_buffers,
>>         &buf->link);
>>
>>         -         if (len < 0) {
>>         -            if (errno == EAGAIN)
>>         -               return true;
>>         -            else {
>>         -               DBG("Error reading i915 perf samples: %m\n");
>>         -               return false;
>>         -            }
>>         -         } else {
>>         +         if (len < 0)
>>         +            DBG("Error reading i915 perf samples: %m\n");
>>         +         else
>>                      DBG("Spurious EOF reading i915 perf samples\n");
>>         -            return false;
>>         -         }
>>         +
>>         +         return false;
>>                }
>>
>>                buf->len = len;
>>          exec_list_push_tail(&brw->perfquery.sample_buffers, &buf->link);
>>         +
>>         +      /* Go through the reports and update the last
>>         timestamp. */
>>         +      while (offset < buf->len) {
>>         +         const struct drm_i915_perf_record_header *header =
>>         +            (const struct drm_i915_perf_record_header *)
>>         &buf->buf[offset];
>>         +         uint32_t *report = (uint32_t *) (header + 1);
>>         +
>>         +         if (header->type == DRM_I915_PERF_RECORD_SAMPLE)
>>         +            last_timestamp = report[1];
>>         +
>>         +         offset += header->size;
>>         +      }
>>             }
>>
>>         -   unreachable("not reached");
>>         -   return false;
>>         +   return true;
>>          }
>>
>>          /**
>>         @@ -709,10 +721,6 @@ accumulate_oa_reports(struct brw_context
>>         *brw,
>>
>>             assert(o->Ready);
>>
>>         -   /* Collect the latest periodic OA reports from i915 perf */
>>         -   if (!read_oa_samples(brw))
>>         -      goto error;
>>         -
>>             drm_intel_bo_map(obj->oa.bo <http://oa.bo>, false);
>>             query_buffer = obj->oa.bo->virtual;
>>
>>         @@ -728,6 +736,11 @@ accumulate_oa_reports(struct brw_context
>>         *brw,
>>                goto error;
>>             }
>>
>>         +   /* Collect the latest periodic OA reports from i915 perf */
>>         +   if (!read_oa_samples_until(brw, start[1], end[1]))
>>         +      goto error;
>>         +
>>         +
>>             /* See if we have any periodic reports to accumulate
>>         too... */
>>
>>             /* N.B. The oa.samples_head was set when the query began and
>>         --
>>         2.11.0
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170405/d4dbcfd1/attachment.html>


More information about the mesa-dev mailing list