<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 04/04/17 18:45, Robert Bragg wrote:<br>
</div>
<blockquote
cite="mid:CAMou1-0aUnyNH4GCsxooOgG1rGASYt=cSTXpLTVKKKU2rDm-rQ@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr"><br>
<div class="gmail_extra">On Tue, Apr 4, 2017 at 10:20 AM, Lionel
Landwerlin <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:lionel.g.landwerlin@intel.com"
target="_blank">lionel.g.landwerlin@intel.com</a>></span>
wrote:<br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-">
<div
class="gmail-m_-5216121689716086367moz-cite-prefix">On
03/04/17 21:04, Robert Bragg wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">
<div><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mar 30, 2017
16:16, "Lionel Landwerlin" <<a
moz-do-not-send="true"
href="mailto:lionel.g.landwerlin@intel.com"
target="_blank">lionel.g.landwerlin@intel.com</a><wbr>>
wrote:<br type="attribution">
<blockquote
class="gmail-m_-5216121689716086367quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">While
exercising reading report with moderate
load, we might have to<br>
wait for all the reports to land in the OA
buffer, otherwise we might<br>
miss some reports. That means we need to
keep on reading the OA stream<br>
until the last report we read has a
timestamp older that the timestamp<br>
recorded by the MI_REPORT_PERF_COUNT at
the end of the performance<br>
query.<br>
</blockquote>
</div>
</div>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">The expectation is that since we
have received the second MI_REPORT_PERF_COUNT
report that implies any periodic/automatic
reports that the could possibly relate to a
query must have been written to the OABUFFER.
Since we read() as much as is available from
i915 perf when we come to finish processing a
query then we shouldn't miss anything.</div>
<div dir="auto"><br>
</div>
<div dir="auto">We shouldn't synchronously wait in
a busy loop until we've found a report that has
a timestamp >= our end MI_RPC because that
could be a really significant amount of time
(e.g. 50+ milliseconds on HSW). NB. the periodic
samples are just to account for counter
overflow. On Gen8+ the loop would likely block
until the next context switch happens.</div>
</div>
</blockquote>
<br>
</span> Right, so if we don't get all the reports until
the second MI_REPORT_PERF_COUNT, we might be dealing
with a kernel issue.<br>
</div>
</blockquote>
<div><br>
</div>
<div>Okey, so the realisation we've come to talking about
this offline is that the kernel's handling of the OA
unit's tail pointer race condition breaks this assumption
:-/<br>
<br>
</div>
<div>Even though any necessary reports must have landed in
the OABUFFER, the kernel will throttle/delay access to the
most recent reports in case the HW tail pointer has got
ahead of what's visible to the CPU in memory.<br>
<br>
</div>
<div>To handle this without blocking in a busy loop as above
I guess we probably need to drive our i915 perf reads()
when the application asks if a query's results are ready
via brw_is_perf_query_ready().<br>
<br>
It also isn't enough for brw_wait_perf_query to rely on
drm_intel_bo_wait_rendering() with the MI_RPC bo.<br>
<br>
</div>
<div>We might want to bump the periodic sampling frequency
to a small multiple of 5 milliseconds to at least reduce
the worst-case latency for results becoming ready. It
wouldn't be that worthwhile sampling with a higher
frequency since the kernel only does tail pointer age
checks at 200Hz currently.<br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> <br>
I'm still suspicious of the code in mesa. I think in
some cases (in particular with many applications
running) it might loop forever.<br>
</div>
</blockquote>
<div><br>
</div>
<div>Hope it's ok, but maybe there's an issue, can you maybe
elaborate on what details looks suspicious to you?<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Missed your question :/<br>
<br>
Basically we might end up reading forever if too many context
switches happen.<br>
(experienced with igt with a bunch of other applications using the
GPU)<br>
<br>
It feels like another good argument for checking the timestamps.<br>
<br>
<blockquote
cite="mid:CAMou1-0aUnyNH4GCsxooOgG1rGASYt=cSTXpLTVKKKU2rDm-rQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> <br>
Also the igt tests using roughly the same loop while
(read() && errno != EINTR) are pretty reliable
at not finding all the reports we need from to the
stream.<br>
So I would expect the same thing from mesa.</div>
</blockquote>
<div><br>
</div>
<div>Right, I wrote the igt code first and when that looked
to be working I ported it to Mesa.<br>
<br>
</div>
<div>The issue with the tail pointer race handling by the
kernel sounds like a likely explanation for seeing issues
here in both mesa and igt.<br>
<br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div>
<div class="gmail-h5"><br>
<br>
<blockquote type="cite">
<div dir="auto">
<div dir="auto"><br>
</div>
<div dir="auto">Was this proposal based on a
code inspection or did you maybe hit a problem
correctly measuring something?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Maybe if based on inspection we
can find somewhere to put a comment to clarify
the assumptions above?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Br,</div>
<div dir="auto">- Robert</div>
<div dir="auto"><br>
</div>
<div dir="auto">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote
class="gmail-m_-5216121689716086367quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"> <br>
Signed-off-by: Lionel Landwerlin <<a
moz-do-not-send="true"
href="mailto:lionel.g.landwerlin@intel.com"
target="_blank">lionel.g.landwerlin@intel.com</a><wbr>><br>
Cc: Robert Bragg <<a
moz-do-not-send="true"
href="mailto:robert@sixbynine.org"
target="_blank">robert@sixbynine.org</a>><br>
---<br>
src/mesa/drivers/dri/i965/<wbr>brw_performance_query.c
| 51 ++++++++++++++---------<br>
1 file changed, 32 insertions(+), 19
deletions(-)<br>
<br>
diff --git
a/src/mesa/drivers/dri/i965/br<wbr>w_performance_query.c
b/src/mesa/drivers/dri/i965/br<wbr>w_performance_query.c<br>
index 4a94e4b3cc..076c59e633 100644<br>
--- a/src/mesa/drivers/dri/i965/br<wbr>w_performance_query.c<br>
+++ b/src/mesa/drivers/dri/i965/br<wbr>w_performance_query.c<br>
@@ -647,38 +647,50 @@
discard_all_queries(struct brw_context
*brw)<br>
}<br>
<br>
static bool<br>
-read_oa_samples(struct brw_context
*brw)<br>
+read_oa_samples_until(struct
brw_context *brw,<br>
+ uint32_t
start_timestamp,<br>
+ uint32_t
end_timestamp)<br>
{<br>
- while (1) {<br>
+ uint32_t last_timestamp =
start_timestamp;<br>
+<br>
+ while (last_timestamp <
end_timestamp) {<br>
struct brw_oa_sample_buf *buf =
get_free_sample_buf(brw);<br>
+ uint32_t offset;<br>
int len;<br>
<br>
while ((len =
read(brw->perfquery.oa_stream_<wbr>fd,
buf->buf,<br>
-
sizeof(buf->buf))) < 0 &&
errno == EINTR)<br>
+
sizeof(buf->buf))) < 0 &&<br>
+ (errno == EINTR || errno
== EAGAIN))<br>
;<br>
<br>
if (len <= 0) {<br>
exec_list_push_tail(&brw->perf<wbr>query.free_sample_buffers,
&buf->link);<br>
<br>
- if (len < 0) {<br>
- if (errno == EAGAIN)<br>
- return true;<br>
- else {<br>
- DBG("Error reading i915
perf samples: %m\n");<br>
- return false;<br>
- }<br>
- } else {<br>
+ if (len < 0)<br>
+ DBG("Error reading i915
perf samples: %m\n");<br>
+ else<br>
DBG("Spurious EOF reading
i915 perf samples\n");<br>
- return false;<br>
- }<br>
+<br>
+ return false;<br>
}<br>
<br>
buf->len = len;<br>
exec_list_push_tail(&brw->per<wbr>fquery.sample_buffers,
&buf->link);<br>
+<br>
+ /* Go through the reports and
update the last timestamp. */<br>
+ while (offset < buf->len) {<br>
+ const struct
drm_i915_perf_record_header *header =<br>
+ (const struct
drm_i915_perf_record_header *)
&buf->buf[offset];<br>
+ uint32_t *report = (uint32_t
*) (header + 1);<br>
+<br>
+ if (header->type ==
DRM_I915_PERF_RECORD_SAMPLE)<br>
+ last_timestamp = report[1];<br>
+<br>
+ offset += header->size;<br>
+ }<br>
}<br>
<br>
- unreachable("not reached");<br>
- return false;<br>
+ return true;<br>
}<br>
<br>
/**<br>
@@ -709,10 +721,6 @@
accumulate_oa_reports(struct brw_context
*brw,<br>
<br>
assert(o->Ready);<br>
<br>
- /* Collect the latest periodic OA
reports from i915 perf */<br>
- if (!read_oa_samples(brw))<br>
- goto error;<br>
-<br>
drm_intel_bo_map(obj-><a
moz-do-not-send="true"
href="http://oa.bo" rel="noreferrer"
target="_blank">oa.bo</a>, false);<br>
query_buffer =
obj->oa.bo->virtual;<br>
<br>
@@ -728,6 +736,11 @@
accumulate_oa_reports(struct brw_context
*brw,<br>
goto error;<br>
}<br>
<br>
+ /* Collect the latest periodic OA
reports from i915 perf */<br>
+ if (!read_oa_samples_until(brw,
start[1], end[1]))<br>
+ goto error;<br>
+<br>
+<br>
/* See if we have any periodic
reports to accumulate too... */<br>
<br>
/* N.B. The oa.samples_head was set
when the query began and<br>
<font color="#888888">--<br>
2.11.0<br>
<br>
</font></blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
<p><br>
</p>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<p><br>
</p>
</body>
</html>