<html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <div class="moz-cite-prefix">On 04/04/17 18:45, Robert Bragg wrote:<br> </div> <blockquote cite="mid:CAMou1-0aUnyNH4GCsxooOgG1rGASYt=cSTXpLTVKKKU2rDm-rQ@mail.gmail.com" type="cite"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <div dir="ltr"><br> <div class="gmail_extra">On Tue, Apr 4, 2017 at 10:20 AM, Lionel Landwerlin <span dir="ltr"><<a moz-do-not-send="true" href="mailto:lionel.g.landwerlin@intel.com" target="_blank">lionel.g.landwerlin@intel.com</a>></span> wrote:<br> <div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div bgcolor="#FFFFFF"><span class="gmail-"> <div class="gmail-m_-5216121689716086367moz-cite-prefix">On 03/04/17 21:04, Robert Bragg wrote:<br> </div> <blockquote type="cite"> <div dir="auto"> <div><br> <div class="gmail_extra"><br> <div class="gmail_quote">On Mar 30, 2017 16:16, "Lionel Landwerlin" <<a moz-do-not-send="true" href="mailto:lionel.g.landwerlin@intel.com" target="_blank">lionel.g.landwerlin@intel.com</a><wbr>> wrote:<br type="attribution"> <blockquote class="gmail-m_-5216121689716086367quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">While exercising reading report with moderate load, we might have to<br> wait for all the reports to land in the OA buffer, otherwise we might<br> miss some reports. That means we need to keep on reading the OA stream<br> until the last report we read has a timestamp older that the timestamp<br> recorded by the MI_REPORT_PERF_COUNT at the end of the performance<br> query.<br> </blockquote> </div> </div> </div> <div dir="auto"><br> </div> <div dir="auto">The expectation is that since we have received the second MI_REPORT_PERF_COUNT report that implies any periodic/automatic reports that the could possibly relate to a query must have been written to the OABUFFER. Since we read() as much as is available from i915 perf when we come to finish processing a query then we shouldn't miss anything.</div> <div dir="auto"><br> </div> <div dir="auto">We shouldn't synchronously wait in a busy loop until we've found a report that has a timestamp >= our end MI_RPC because that could be a really significant amount of time (e.g. 50+ milliseconds on HSW). NB. the periodic samples are just to account for counter overflow. On Gen8+ the loop would likely block until the next context switch happens.</div> </div> </blockquote> <br> </span> Right, so if we don't get all the reports until the second MI_REPORT_PERF_COUNT, we might be dealing with a kernel issue.<br> </div> </blockquote> <div><br> </div> <div>Okey, so the realisation we've come to talking about this offline is that the kernel's handling of the OA unit's tail pointer race condition breaks this assumption :-/<br> <br> </div> <div>Even though any necessary reports must have landed in the OABUFFER, the kernel will throttle/delay access to the most recent reports in case the HW tail pointer has got ahead of what's visible to the CPU in memory.<br> <br> </div> <div>To handle this without blocking in a busy loop as above I guess we probably need to drive our i915 perf reads() when the application asks if a query's results are ready via brw_is_perf_query_ready().<br> <br> It also isn't enough for brw_wait_perf_query to rely on drm_intel_bo_wait_rendering() with the MI_RPC bo.<br> <br> </div> <div>We might want to bump the periodic sampling frequency to a small multiple of 5 milliseconds to at least reduce the worst-case latency for results becoming ready. It wouldn't be that worthwhile sampling with a higher frequency since the kernel only does tail pointer age checks at 200Hz currently.<br> </div> <div> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div bgcolor="#FFFFFF"> <br> I'm still suspicious of the code in mesa. I think in some cases (in particular with many applications running) it might loop forever.<br> </div> </blockquote> <div><br> </div> <div>Hope it's ok, but maybe there's an issue, can you maybe elaborate on what details looks suspicious to you?<br> </div> </div> </div> </div> </blockquote> <br> Missed your question :/<br> <br> Basically we might end up reading forever if too many context switches happen.<br> (experienced with igt with a bunch of other applications using the GPU)<br> <br> It feels like another good argument for checking the timestamps.<br> <br> <blockquote cite="mid:CAMou1-0aUnyNH4GCsxooOgG1rGASYt=cSTXpLTVKKKU2rDm-rQ@mail.gmail.com" type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <div> <br> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div bgcolor="#FFFFFF"> <br> Also the igt tests using roughly the same loop while (read() && errno != EINTR) are pretty reliable at not finding all the reports we need from to the stream.<br> So I would expect the same thing from mesa.</div> </blockquote> <div><br> </div> <div>Right, I wrote the igt code first and when that looked to be working I ported it to Mesa.<br> <br> </div> <div>The issue with the tail pointer race handling by the kernel sounds like a likely explanation for seeing issues here in both mesa and igt.<br> <br> </div> <div> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div bgcolor="#FFFFFF"> <div> <div class="gmail-h5"><br> <br> <blockquote type="cite"> <div dir="auto"> <div dir="auto"><br> </div> <div dir="auto">Was this proposal based on a code inspection or did you maybe hit a problem correctly measuring something?</div> <div dir="auto"><br> </div> <div dir="auto">Maybe if based on inspection we can find somewhere to put a comment to clarify the assumptions above?</div> <div dir="auto"><br> </div> <div dir="auto">Br,</div> <div dir="auto">- Robert</div> <div dir="auto"><br> </div> <div dir="auto"> <div class="gmail_extra"> <div class="gmail_quote"> <blockquote class="gmail-m_-5216121689716086367quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> Signed-off-by: Lionel Landwerlin <<a moz-do-not-send="true" href="mailto:lionel.g.landwerlin@intel.com" target="_blank">lionel.g.landwerlin@intel.com</a><wbr>><br> Cc: Robert Bragg <<a moz-do-not-send="true" href="mailto:robert@sixbynine.org" target="_blank">robert@sixbynine.org</a>><br> ---<br> src/mesa/drivers/dri/i965/<wbr>brw_performance_query.c | 51 ++++++++++++++---------<br> 1 file changed, 32 insertions(+), 19 deletions(-)<br> <br> diff --git a/src/mesa/drivers/dri/i965/br<wbr>w_performance_query.c b/src/mesa/drivers/dri/i965/br<wbr>w_performance_query.c<br> index 4a94e4b3cc..076c59e633 100644<br> --- a/src/mesa/drivers/dri/i965/br<wbr>w_performance_query.c<br> +++ b/src/mesa/drivers/dri/i965/br<wbr>w_performance_query.c<br> @@ -647,38 +647,50 @@ discard_all_queries(struct brw_context *brw)<br> }<br> <br> static bool<br> -read_oa_samples(struct brw_context *brw)<br> +read_oa_samples_until(struct brw_context *brw,<br> + uint32_t start_timestamp,<br> + uint32_t end_timestamp)<br> {<br> - while (1) {<br> + uint32_t last_timestamp = start_timestamp;<br> +<br> + while (last_timestamp < end_timestamp) {<br> struct brw_oa_sample_buf *buf = get_free_sample_buf(brw);<br> + uint32_t offset;<br> int len;<br> <br> while ((len = read(brw->perfquery.oa_stream_<wbr>fd, buf->buf,<br> - sizeof(buf->buf))) < 0 && errno == EINTR)<br> + sizeof(buf->buf))) < 0 &&<br> + (errno == EINTR || errno == EAGAIN))<br> ;<br> <br> if (len <= 0) {<br> exec_list_push_tail(&brw->perf<wbr>query.free_sample_buffers, &buf->link);<br> <br> - if (len < 0) {<br> - if (errno == EAGAIN)<br> - return true;<br> - else {<br> - DBG("Error reading i915 perf samples: %m\n");<br> - return false;<br> - }<br> - } else {<br> + if (len < 0)<br> + DBG("Error reading i915 perf samples: %m\n");<br> + else<br> DBG("Spurious EOF reading i915 perf samples\n");<br> - return false;<br> - }<br> +<br> + return false;<br> }<br> <br> buf->len = len;<br> exec_list_push_tail(&brw->per<wbr>fquery.sample_buffers, &buf->link);<br> +<br> + /* Go through the reports and update the last timestamp. */<br> + while (offset < buf->len) {<br> + const struct drm_i915_perf_record_header *header =<br> + (const struct drm_i915_perf_record_header *) &buf->buf[offset];<br> + uint32_t *report = (uint32_t *) (header + 1);<br> +<br> + if (header->type == DRM_I915_PERF_RECORD_SAMPLE)<br> + last_timestamp = report[1];<br> +<br> + offset += header->size;<br> + }<br> }<br> <br> - unreachable("not reached");<br> - return false;<br> + return true;<br> }<br> <br> /**<br> @@ -709,10 +721,6 @@ accumulate_oa_reports(struct brw_context *brw,<br> <br> assert(o->Ready);<br> <br> - /* Collect the latest periodic OA reports from i915 perf */<br> - if (!read_oa_samples(brw))<br> - goto error;<br> -<br> drm_intel_bo_map(obj-><a moz-do-not-send="true" href="http://oa.bo" rel="noreferrer" target="_blank">oa.bo</a>, false);<br> query_buffer = obj->oa.bo->virtual;<br> <br> @@ -728,6 +736,11 @@ accumulate_oa_reports(struct brw_context *brw,<br> goto error;<br> }<br> <br> + /* Collect the latest periodic OA reports from i915 perf */<br> + if (!read_oa_samples_until(brw, start[1], end[1]))<br> + goto error;<br> +<br> +<br> /* See if we have any periodic reports to accumulate too... */<br> <br> /* N.B. The oa.samples_head was set when the query began and<br> <font color="#888888">--<br> 2.11.0<br> <br> </font></blockquote> </div> <br> </div> </div> </div> </blockquote> <p><br> </p> </div> </div> </div> </blockquote> </div> <br> </div> </div> </blockquote> <p><br> </p> </body> </html>