[Intel-gfx] [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events

Sagar Arun Kamble sagar.a.kamble at intel.com
Fri Dec 22 06:06:36 UTC 2017



On 12/7/2017 6:18 AM, Robert Bragg wrote:
>
>
> On Wed, Nov 15, 2017 at 12:13 PM, Sagar Arun Kamble 
> <sagar.a.kamble at intel.com <mailto:sagar.a.kamble at intel.com>> wrote:
>
>     We can compute system time corresponding to GPU timestamp by taking a
>     reference point (CPU monotonic time, GPU timestamp) and then adding
>     delta time computed using timecounter/cyclecounter support in kernel.
>     We have to configure cyclecounter with the GPU timestamp frequency.
>     Earlier approach that was based on cross-timestamp is not needed. It
>     was being used to approximate the frequency based on invalid
>     assumptions
>     (possibly drift was being seen in the time due to precision issue).
>     The precision of time from GPU clocks is already in ns and timecounter
>     takes care of it as verified over variable durations.
>
>
> Hi Sagar,
>
> I have some doubts about this analysis...
>
> The intent behind Sourab's original approach was to be able to 
> determine the frequency at runtime empirically because the constants 
> we have aren't particularly accurate. Without a perfectly stable 
> frequency that's known very precisely then an interpolated correlation 
> will inevitably drift. I think the nature of HW implies we can't 
> expect to have either of those. Then the general idea had been to try 
> and use existing kernel infrastructure for a problem which isn't 
> unique to GPU clocks.
Hi Robert,

Testing on SKL shows timestamps drift only about 10us for sampling done 
in kernel for about 30min time.
Verified with changes from 
https://github.com/sakamble/i915-timestamp-support/commits/drm-tip
Note that since we are sampling counter in debugfs, there is likely 
overhead of read that is adding to drift so adjustment might be needed.
But with OA reports we just have to worry about initial timecounter 
setup where we need accurate pair of system time and GPU timestamp clock 
counts.
I think timestamp clock is highly stable and we don't need logic to 
determine frequency at runtime. Will try to get confirmation from HW 
team as well.

If we need to determine the frequency, Sourab's approach needs to refined as
1. It can be implemented entirely in i915 because what we need is pair 
of system time and gpu clocks over different durations.
2. crosstimestamp framework usage in that approach is incorrect as 
ideally we should be sending ART counter and GPU counter. Instead we were
hacking to send the TSC clock.
Quoting Thomas from  https://patchwork.freedesktop.org/patch/144298/
get_device_system_crosststamp() is for timestamps taken via a clock 
which is directly correlated with the timekeeper clocksource.

ART and TSC are correlated via: TSC = (ART * scale) + offset
get_device_system_crosststamp() invokes the device function which reads 
ART, which is converted to CLOCK_MONOTONIC_RAW by the conversion above,
and then uses interpolation to map the CLOCK_MONOTONIC_RAW value to 
CLOCK_MONOTONIC.
The device function does not know anything about TSC. All it knows about 
is ART.

I am not aware if GPU timestamp clock is correlated with TSC like ART 
for ethernet drivers and if i915 can read ART like ethernet drivers.
3. I have seen precision issues in the calculations in 
i915_perf_clock_sync_work and usage of MONO_RAW which might jump time.
>
> That's not to say that a more limited, simpler solution based on 
> frequent re-correlation wouldn't be more than welcome if tracking an 
> accurate frequency is too awkward for now
Adjusting timecounter time can be another option if we confirm that GPU 
timestamp frequency is stable.
> , but I think some things need to be considered in that case:
>
> - It would be good to quantify the kind of drift seen in practice to 
> know how frequently it's necessary to re-synchronize. It sounds like 
> you've done this ("as verified over variable durations") so I'm 
> curious what kind of drift you saw. I'd imagine you would see a 
> significant drift over, say, one second and it might not take much 
> longer for the drift to even become clearly visible to the user when 
> plotted in a UI. For reference I once updated the arb_timer_query test 
> in piglit to give some insight into this drift 
> (https://lists.freedesktop.org/archives/piglit/2016-September/020673.html) 
> and at least from what I wrote back then it looks like I was seeing a 
> drift of a few milliseconds per second on SKL. I vaguely recall it 
> being much worse given the frequency constants we had for Haswell.
>
On SKL I have seen very small drift of less than 10us over a period of 
30 minutes.
Verified with changes from 
https://github.com/sakamble/i915-timestamp-support/commits/drm-tip

36bit counter will overflow in about 95min at 12mhz and timecounter 
framework considers
counter value with delta from timecounter init of more than half of 
total time covered by counter as time in the past so current approach 
works for less than 45min.
Will need to add overflow watchdog support like other drivers which just 
reinitializes timecounter prior to 45min.

> - What guarantees will be promised about monotonicity of correlated 
> system timestamps? Will it be guaranteed that sequential reports must 
> have monotonically increasing timestamps? That might be fiddly if the 
> gpu + system clock are periodically re-correlated, so it might be good 
> to be clear in documentation that the correlation is best-effort only 
> for the sake of implementation simplicity. That would still be good 
> for a lot of UIs I think and there's freedom for the driver to start 
> simple and potentially improve later by measuring the gpu clock 
> frequency empirically.
>
If we rely on timecounter alone without correlation to know frequency, 
setting init time as MONOTONIC system time should take care of 
monotonicity of correlated times.

Regards,
Sagar
> Currently only one correlated pair of timestamps is read when enabling 
> the stream and so a relatively long time is likely to pass before the 
> stream is disabled (seconds, minutes while a user is running a system 
> profiler) . It seems very likely to me that these clocks are going to 
> drift significantly without introducing some form of periodic 
> re-synchronization based on some understanding of the drift that's seen.
>
> Br,
> - Robert
>
>
>
>     This series adds base timecounter/cyclecounter changes and changes to
>     get GPU and CPU timestamps in OA samples.
>
>     Sagar Arun Kamble (1):
>       drm/i915/perf: Add support to correlate GPU timestamp with
>     system time
>
>     Sourab Gupta (3):
>       drm/i915/perf: Add support for collecting 64 bit timestamps with OA
>         reports
>       drm/i915/perf: Extract raw GPU timestamps from OA reports
>       drm/i915/perf: Send system clock monotonic time in perf samples
>
>      drivers/gpu/drm/i915/i915_drv.h  |  11 ++++
>      drivers/gpu/drm/i915/i915_perf.c | 124
>     ++++++++++++++++++++++++++++++++++++++-
>      drivers/gpu/drm/i915/i915_reg.h  |   6 ++
>      include/uapi/drm/i915_drm.h      |  14 +++++
>      4 files changed, 154 insertions(+), 1 deletion(-)
>
>     --
>     1.9.1
>
>     _______________________________________________
>     Intel-gfx mailing list
>     Intel-gfx at lists.freedesktop.org
>     <mailto:Intel-gfx at lists.freedesktop.org>
>     https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>     <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20171222/841c0dc1/attachment-0001.html>


More information about the Intel-gfx mailing list