[Intel-gfx] [PATCH] i915/query: Correlate engine and cpu timestamps with better accuracy
Lionel Landwerlin
lionel.g.landwerlin at intel.com
Thu Mar 4 08:28:59 UTC 2021
On 04/03/2021 02:09, Chris Wilson wrote:
> Quoting Umesh Nerlige Ramappa (2021-03-03 21:28:00)
>> Perf measurements rely on CPU and engine timestamps to correlate
>> events of interest across these time domains. Current mechanisms get
>> these timestamps separately and the calculated delta between these
>> timestamps lack enough accuracy.
>>
>> To improve the accuracy of these time measurements to within a few us,
>> add a query that returns the engine and cpu timestamps captured as
>> close to each other as possible.
>>
>> v2: (Tvrtko)
>> - document clock reference used
>> - return cpu timestamp always
>> - capture cpu time just before lower dword of cs timestamp
>>
>> v3: (Chris)
>> - use uncore-rpm
>> - use __query_cs_timestamp helper
>>
>> v4: (Lionel)
>> - Kernel perf subsytem allows users to specify the clock id to be used
>> in perf_event_open. This clock id is used by the perf subsystem to
>> return the appropriate cpu timestamp in perf events. Similarly, let
>> the user pass the clockid to this query so that cpu timestamp
>> corresponds to the clock id requested.
>>
>> v5: (Tvrtko)
>> - Use normal ktime accessors instead of fast versions
>> - Add more uApi documentation
>>
>> v6: (Lionel)
>> - Move switch out of spinlock
>>
>> v7: (Chris)
>> - cs_timestamp is a misnomer, use cs_cycles instead
>> - return the cs cycle frequency as well in the query
>>
>> v8:
>> - Add platform and engine specific checks
>>
>> v9: (Lionel)
>> - Return 2 cpu timestamps in the query - captured before and after the
>> register read
>>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_query.c | 144 ++++++++++++++++++++++++++++++
>> include/uapi/drm/i915_drm.h | 47 ++++++++++
>> 2 files changed, 191 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
>> index fed337ad7b68..acca22ee6014 100644
>> --- a/drivers/gpu/drm/i915/i915_query.c
>> +++ b/drivers/gpu/drm/i915/i915_query.c
>> @@ -6,6 +6,8 @@
>>
>> #include <linux/nospec.h>
>>
>> +#include "gt/intel_engine_pm.h"
>> +#include "gt/intel_engine_user.h"
>> #include "i915_drv.h"
>> #include "i915_perf.h"
>> #include "i915_query.h"
>> @@ -90,6 +92,147 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
>> return total_length;
>> }
>>
>> +typedef u64 (*__ktime_func_t)(void);
>> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
>> +{
>> + /*
>> + * Use logic same as the perf subsystem to allow user to select the
>> + * reference clock id to be used for timestamps.
>> + */
>> + switch (clk_id) {
>> + case CLOCK_MONOTONIC:
>> + return &ktime_get_ns;
>> + case CLOCK_MONOTONIC_RAW:
>> + return &ktime_get_raw_ns;
>> + case CLOCK_REALTIME:
>> + return &ktime_get_real_ns;
>> + case CLOCK_BOOTTIME:
>> + return &ktime_get_boottime_ns;
>> + case CLOCK_TAI:
>> + return &ktime_get_clocktai_ns;
>> + default:
>> + return NULL;
>> + }
>> +}
>> +
>> +static inline int
>> +__read_timestamps(struct intel_uncore *uncore,
>> + i915_reg_t lower_reg,
>> + i915_reg_t upper_reg,
>> + u64 *cs_ts,
>> + u64 *cpu_ts,
>> + __ktime_func_t cpu_clock)
>> +{
>> + u32 upper, lower, old_upper, loop = 0;
>> +
>> + upper = intel_uncore_read_fw(uncore, upper_reg);
>> + do {
>> + cpu_ts[0] = cpu_clock();
>> + lower = intel_uncore_read_fw(uncore, lower_reg);
>> + cpu_ts[1] = cpu_clock();
>> + old_upper = upper;
>> + upper = intel_uncore_read_fw(uncore, upper_reg);
> Both register reads comprise the timestamp returned to userspace, so
> presumably you want cpu_ts[] to wrap both.
>
> do {
> old_upper = upper;
>
> cpu_ts[0] = cpu_clock();
> lower = intel_uncore_read_fw(uncore, lower_reg);
> upper = intel_uncore_read_fw(uncore, upper_reg);
> cpu_ts[1] = cpu_clock();
> } while (upper != old_upper && loop++ < 2);
Actually if we want the best accuracy we can just deal with the lower dword.
We can check the upper one hasn't changed outside of the 2 cpu_clock()
calls.
-Lionel
More information about the Intel-gfx
mailing list