[Intel-gfx] [PATCH] i915/query: Correlate engine and cpu timestamps with better accuracy
Lionel Landwerlin
lionel.g.landwerlin at intel.com
Thu Mar 4 10:27:36 UTC 2021
On 04/03/2021 11:54, Chris Wilson wrote:
>>>> Actually if we want the best accuracy we can just deal with the lower dword.
>>> Accuracy of what? The lower dword read perhaps, or the accuracy of the
>>> sample point for the combined reads for the timestamp, which is closer
>>> to an external observer (cpu_clock() implies reference to an external
>>> observer).
>>>
>>> The two clock samples are not even necessarily closely related due to the
>>> nmi adjustments. If you wanted an unadjusted elapsed time for the read
>>> you can use local_clock() then return the chosen cpu_clock() before plus
>>> the elapsed delta from around the read as the estimated error.
>>>
>>> cpu_ts[1] = local_clock();
>>> cpu_ts[0] = cpu_clock();
>>> lower = intel_uncore_read_fw(uncore, lower_reg);
>>> cpu_ts[1] = local_clock() - cpu_ts[1];
>>> -Chris
>> Thanks,
>>
>>
>> I meant the accuracy of having 2 samples GPU/CPU as close as possible.
>>
>> Avoiding to account another register read in there is nice.
>>
>>
>> My testing was also mostly done with CLOCK_MONOTONIC_RAW which doesn't
>> seem to be adjusted like CLOCK_MONOTONIC so maybe that why I didn't see
>> the issue.
> _RAW is still adjusted for skews, just not coupled into the ntp feedback.
> That is less obvious than the other clocks, and why it's preferred for
> comparing against other HW sources. But two reads of _RAW are only
> monotonic, not necessarily on the same time base. local_clock() is
> tsc/arat, so counting the CPU cycles between the two reads with the
> frequency (at least on x86) held constant (and arat should be frequency
> invariant).
>
> If we want much better accuracy, we are supposed to use cyclecounter_t
> and the system_device_crosststamp.
> -Chris
Thanks for the pointers.
I think people are mostly trying to map what's coming out of OA or
queries from the various command streamers back to perf/ftrace.
As far I know perf will only let you select a clockid.
So maybe cyclecounter_t is not that useful atm.
-Lionel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20210304/ff30bcef/attachment.htm>
More information about the Intel-gfx
mailing list