<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 04/03/2021 11:54, Chris Wilson
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:161485169130.28586.8322916604277505617@build.alporthouse.com">
<blockquote type="cite" style="color: #007cff;">
<blockquote type="cite" style="color: #007cff;">
<blockquote type="cite" style="color: #007cff;">
<pre class="moz-quote-pre" wrap="">Actually if we want the best accuracy we can just deal with the lower dword.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">Accuracy of what? The lower dword read perhaps, or the accuracy of the
sample point for the combined reads for the timestamp, which is closer
to an external observer (cpu_clock() implies reference to an external
observer).
The two clock samples are not even necessarily closely related due to the
nmi adjustments. If you wanted an unadjusted elapsed time for the read
you can use local_clock() then return the chosen cpu_clock() before plus
the elapsed delta from around the read as the estimated error.
cpu_ts[1] = local_clock();
cpu_ts[0] = cpu_clock();
lower = intel_uncore_read_fw(uncore, lower_reg);
cpu_ts[1] = local_clock() - cpu_ts[1];
-Chris
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">Thanks,
I meant the accuracy of having 2 samples GPU/CPU as close as possible.
Avoiding to account another register read in there is nice.
My testing was also mostly done with CLOCK_MONOTONIC_RAW which doesn't
seem to be adjusted like CLOCK_MONOTONIC so maybe that why I didn't see
the issue.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">_RAW is still adjusted for skews, just not coupled into the ntp feedback.
That is less obvious than the other clocks, and why it's preferred for
comparing against other HW sources. But two reads of _RAW are only
monotonic, not necessarily on the same time base. local_clock() is
tsc/arat, so counting the CPU cycles between the two reads with the
frequency (at least on x86) held constant (and arat should be frequency
invariant).
If we want much better accuracy, we are supposed to use cyclecounter_t
and the system_device_crosststamp.
-Chris
</pre>
</blockquote>
<p>Thanks for the pointers.</p>
<p>I think people are mostly trying to map what's coming out of OA
or queries from the various command streamers back to perf/ftrace.</p>
<p>As far I know perf will only let you select a clockid.</p>
<p><br>
</p>
<p>So maybe cyclecounter_t is not that useful atm.</p>
<p><br>
</p>
<p>-Lionel<br>
</p>
</body>
</html>