[Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu
Matthew Brost
matthew.brost at intel.com
Wed Oct 27 20:02:43 UTC 2021
On Tue, Oct 26, 2021 at 05:48:21PM -0700, Umesh Nerlige Ramappa wrote:
> With GuC handling scheduling, i915 is not aware of the time that a
> context is scheduled in and out of the engine. Since i915 pmu relies on
> this info to provide engine busyness to the user, GuC shares this info
> with i915 for all engines using shared memory. For each engine, this
> info contains:
>
> - total busyness: total time that the context was running (total)
> - id: id of the running context (id)
> - start timestamp: timestamp when the context started running (start)
>
> At the time (now) of sampling the engine busyness, if the id is valid
> (!= ~0), and start is non-zero, then the context is considered to be
> active and the engine busyness is calculated using the below equation
>
> engine busyness = total + (now - start)
>
> All times are obtained from the gt clock base. For inactive contexts,
> engine busyness is just equal to the total.
>
> The start and total values provided by GuC are 32 bits and wrap around
> in a few minutes. Since perf pmu provides busyness as 64 bit
> monotonically increasing values, there is a need for this implementation
> to account for overflows and extend the time to 64 bits before returning
> busyness to the user. In order to do that, a worker runs periodically at
> frequency = 1/8th the time it takes for the timestamp to wrap. As an
> example, that would be once in 27 seconds for a gt clock frequency of
> 19.2 MHz.
>
> Note:
> There might be an over-accounting of busyness due to the fact that GuC
> may be updating the total and start values while kmd is reading them.
> (i.e kmd may read the updated total and the stale start). In such a
> case, user may see higher busyness value followed by smaller ones which
> would eventually catch up to the higher value.
>
> v2: (Tvrtko)
> - Include details in commit message
> - Move intel engine busyness function into execlist code
> - Use union inside engine->stats
> - Use natural type for ping delay jiffies
> - Drop active_work condition checks
> - Use for_each_engine if iterating all engines
> - Drop seq locking, use spinlock at GuC level to update engine stats
> - Document worker specific details
>
> v3: (Tvrtko/Umesh)
> - Demarcate GuC and execlist stat objects with comments
> - Document known over-accounting issue in commit
> - Provide a consistent view of GuC state
> - Add hooks to gt park/unpark for GuC busyness
> - Stop/start worker in gt park/unpark path
> - Drop inline
> - Move spinlock and worker inits to GuC initialization
> - Drop helpers that are called only once
>
> v4: (Tvrtko/Matt/Umesh)
> - Drop addressed opens from commit message
> - Get runtime pm in ping, remove from the park path
> - Use cancel_delayed_work_sync in disable_submission path
> - Update stats during reset prepare
> - Skip ping if reset in progress
> - Explicitly name execlists and GuC stats objects
> - Since disable_submission is called from many places, move resetting
> stats to intel_guc_submission_reset_prepare
>
> v5: (Tvrtko)
> - Add a trylock helper that does not sleep and synchronize PMU event
> callbacks and worker with gt reset
>
> v6: (CI BAT failures)
> - DUTs using execlist submission failed to boot since __gt_unpark is
> called during i915 load. This ends up calling the GuC busyness unpark
> hook and results in kick-starting an uninitialized worker. Let
> park/unpark hooks check if GuC submission has been initialized.
> - drop cant_sleep() from trylock helper since rcu_read_lock takes care
> of that.
>
> v7: (CI) Fix igt at i915_selftest@live at gt_engines
> - For GuC mode of submission the engine busyness is derived from gt time
> domain. Use gt time elapsed as reference in the selftest.
> - Increase busyness calculation to 10ms duration to ensure batch runs
> longer and falls within the busyness tolerances in selftest.
>
> v8:
> - Use ktime_get in selftest as before
> - intel_reset_trylock_no_wait results in a lockdep splat that is not
> trivial to fix since the PMU callback runs in irq context and the
> reset paths are tightly knit into the driver. The test that uncovers
> this is igt at perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
> instead use the reset_count to synchronize with gt reset during pmu
> callback. For the ping, continue to use intel_reset_trylock since ping
> is not run in irq context.
>
> - GuC PM timestamp does not tick when GuC is idle. This can potentially
> result in wrong busyness values when a context is active on the
> engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
> process the GuC busyness stats. This works since both GuC timestamp and
> RING timestamp are synced with the same clock.
>
> - The busyness stats may get updated after the batch starts running.
> This delay causes the busyness reported for 100us duration to fall
> below 95% in the selftest. The only option at this time is to wait for
> GuC busyness to change from idle to active before we sample busyness
> over a 100us period.
>
> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
> Acked-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
> drivers/gpu/drm/i915/gt/intel_engine_cs.c | 28 +-
> drivers/gpu/drm/i915/gt/intel_engine_types.h | 33 ++-
> .../drm/i915/gt/intel_execlists_submission.c | 34 +++
> drivers/gpu/drm/i915/gt/intel_gt_pm.c | 2 +
> drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 33 +++
> .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 1 +
> drivers/gpu/drm/i915/gt/uc/intel_guc.h | 30 ++
> drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 21 ++
> drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h | 5 +
> drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 13 +
> .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 277 ++++++++++++++++++
> .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 +
> drivers/gpu/drm/i915/i915_reg.h | 2 +
> 13 files changed, 453 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 2de396e34d83..332756036007 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -1915,23 +1915,6 @@ void intel_engine_dump(struct intel_engine_cs *engine,
> intel_engine_print_breadcrumbs(engine, m);
> }
>
> -static ktime_t __intel_engine_get_busy_time(struct intel_engine_cs *engine,
> - ktime_t *now)
> -{
> - struct intel_engine_execlists_stats *stats = &engine->stats.execlists;
> - ktime_t total = stats->total;
> -
> - /*
> - * If the engine is executing something at the moment
> - * add it to the total.
> - */
> - *now = ktime_get();
> - if (READ_ONCE(stats->active))
> - total = ktime_add(total, ktime_sub(*now, stats->start));
> -
> - return total;
> -}
> -
> /**
> * intel_engine_get_busy_time() - Return current accumulated engine busyness
> * @engine: engine to report on
> @@ -1941,16 +1924,7 @@ static ktime_t __intel_engine_get_busy_time(struct intel_engine_cs *engine,
> */
> ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now)
> {
> - struct intel_engine_execlists_stats *stats = &engine->stats.execlists;
> - unsigned int seq;
> - ktime_t total;
> -
> - do {
> - seq = read_seqcount_begin(&stats->lock);
> - total = __intel_engine_get_busy_time(engine, now);
> - } while (read_seqcount_retry(&stats->lock, seq));
> -
> - return total;
> + return engine->busyness(engine, now);
> }
>
> struct intel_context *
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 24fa7fb0e7de..5732e0d71513 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -284,6 +284,28 @@ struct intel_engine_execlists_stats {
> ktime_t start;
> };
>
> +struct intel_engine_guc_stats {
> + /**
> + * @running: Active state of the engine when busyness was last sampled.
> + */
> + bool running;
> +
> + /**
> + * @prev_total: Previous value of total runtime clock cycles.
> + */
> + u32 prev_total;
> +
> + /**
> + * @total_gt_clks: Total gt clock cycles this engine was busy.
> + */
> + u64 total_gt_clks;
> +
> + /**
> + * @start_gt_clk: GT clock time of last idle to active transition.
> + */
> + u64 start_gt_clk;
> +};
> +
> struct intel_engine_cs {
> struct drm_i915_private *i915;
> struct intel_gt *gt;
> @@ -466,6 +488,12 @@ struct intel_engine_cs {
> void (*add_active_request)(struct i915_request *rq);
> void (*remove_active_request)(struct i915_request *rq);
>
> + /*
> + * Get engine busyness and the time at which the busyness was sampled.
> + */
> + ktime_t (*busyness)(struct intel_engine_cs *engine,
> + ktime_t *now);
> +
> struct intel_engine_execlists execlists;
>
> /*
> @@ -515,7 +543,10 @@ struct intel_engine_cs {
> u32 (*get_cmd_length_mask)(u32 cmd_header);
>
> struct {
> - struct intel_engine_execlists_stats execlists;
> + union {
> + struct intel_engine_execlists_stats execlists;
> + struct intel_engine_guc_stats guc;
> + };
>
> /**
> * @rps: Utilisation at last RPS sampling.
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index bedb80057046..ca03880fa7e4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3293,6 +3293,38 @@ static void execlists_release(struct intel_engine_cs *engine)
> lrc_fini_wa_ctx(engine);
> }
>
> +static ktime_t __execlists_engine_busyness(struct intel_engine_cs *engine,
> + ktime_t *now)
> +{
> + struct intel_engine_execlists_stats *stats = &engine->stats.execlists;
> + ktime_t total = stats->total;
> +
> + /*
> + * If the engine is executing something at the moment
> + * add it to the total.
> + */
> + *now = ktime_get();
> + if (READ_ONCE(stats->active))
> + total = ktime_add(total, ktime_sub(*now, stats->start));
> +
> + return total;
> +}
> +
> +static ktime_t execlists_engine_busyness(struct intel_engine_cs *engine,
> + ktime_t *now)
> +{
> + struct intel_engine_execlists_stats *stats = &engine->stats.execlists;
> + unsigned int seq;
> + ktime_t total;
> +
> + do {
> + seq = read_seqcount_begin(&stats->lock);
> + total = __execlists_engine_busyness(engine, now);
> + } while (read_seqcount_retry(&stats->lock, seq));
> +
> + return total;
> +}
> +
> static void
> logical_ring_default_vfuncs(struct intel_engine_cs *engine)
> {
> @@ -3349,6 +3381,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
> engine->emit_bb_start = gen8_emit_bb_start;
> else
> engine->emit_bb_start = gen8_emit_bb_start_noarb;
> +
> + engine->busyness = execlists_engine_busyness;
> }
>
> static void logical_ring_default_irqs(struct intel_engine_cs *engine)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index 524eaf678790..b4a8594bc46c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -86,6 +86,7 @@ static int __gt_unpark(struct intel_wakeref *wf)
> intel_rc6_unpark(>->rc6);
> intel_rps_unpark(>->rps);
> i915_pmu_gt_unparked(i915);
> + intel_guc_busyness_unpark(gt);
>
> intel_gt_unpark_requests(gt);
> runtime_begin(gt);
> @@ -104,6 +105,7 @@ static int __gt_park(struct intel_wakeref *wf)
> runtime_end(gt);
> intel_gt_park_requests(gt);
>
> + intel_guc_busyness_park(gt);
> i915_vma_parked(gt);
> i915_pmu_gt_parked(i915);
> intel_rps_park(>->rps);
> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
> index 75569666105d..0bfd738dbf3a 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
> @@ -214,6 +214,31 @@ static int live_engine_timestamps(void *arg)
> return 0;
> }
>
> +static int __spin_until_busier(struct intel_engine_cs *engine, ktime_t busyness)
> +{
> + ktime_t start, unused, dt;
> +
> + if (!intel_engine_uses_guc(engine))
> + return 0;
> +
> + /*
> + * In GuC mode of submission, the busyness stats may get updated after
> + * the batch starts running. Poll for a change in busyness and timeout
> + * after 500 us.
> + */
> + start = ktime_get();
> + while (intel_engine_get_busy_time(engine, &unused) == busyness) {
> + dt = ktime_get() - start;
> + if (dt > 500000) {
> + pr_err("active wait timed out %lld\n", dt);
> + ENGINE_TRACE(engine, "active wait time out %lld\n", dt);
> + return -ETIME;
> + }
> + }
> +
> + return 0;
> +}
> +
> static int live_engine_busy_stats(void *arg)
> {
> struct intel_gt *gt = arg;
> @@ -232,6 +257,7 @@ static int live_engine_busy_stats(void *arg)
> GEM_BUG_ON(intel_gt_pm_is_awake(gt));
> for_each_engine(engine, gt, id) {
> struct i915_request *rq;
> + ktime_t busyness, dummy;
> ktime_t de, dt;
> ktime_t t[2];
>
> @@ -274,12 +300,19 @@ static int live_engine_busy_stats(void *arg)
> }
> i915_request_add(rq);
>
> + busyness = intel_engine_get_busy_time(engine, &dummy);
> if (!igt_wait_for_spinner(&spin, rq)) {
> intel_gt_set_wedged(engine->gt);
> err = -ETIME;
> goto end;
> }
>
> + err = __spin_until_busier(engine, busyness);
> + if (err) {
> + GEM_TRACE_DUMP();
> + goto end;
> + }
> +
> ENGINE_TRACE(engine, "measuring busy time\n");
> preempt_disable();
> de = intel_engine_get_busy_time(engine, &t[0]);
> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> index ba10bd374cee..fe5d7d261797 100644
> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> @@ -144,6 +144,7 @@ enum intel_guc_action {
> INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
> INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
> INTEL_GUC_ACTION_RESET_CLIENT = 0x5507,
> + INTEL_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
> INTEL_GUC_ACTION_LIMIT
> };
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 31cf9fb48c7e..1cb46098030d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -138,6 +138,8 @@ struct intel_guc {
> u32 ads_regset_size;
> /** @ads_golden_ctxt_size: size of the golden contexts in the ADS */
> u32 ads_golden_ctxt_size;
> + /** @ads_engine_usage_size: size of engine usage in the ADS */
> + u32 ads_engine_usage_size;
>
> /** @lrc_desc_pool: object allocated to hold the GuC LRC descriptor pool */
> struct i915_vma *lrc_desc_pool;
> @@ -172,6 +174,34 @@ struct intel_guc {
>
> /** @send_mutex: used to serialize the intel_guc_send actions */
> struct mutex send_mutex;
> +
> + /**
> + * @timestamp: GT timestamp object that stores a copy of the timestamp
> + * and adjusts it for overflow using a worker.
> + */
> + struct {
> + /**
> + * @lock: Lock protecting the below fields and the engine stats.
> + */
> + spinlock_t lock;
> +
> + /**
> + * @gt_stamp: 64 bit extended value of the GT timestamp.
> + */
> + u64 gt_stamp;
> +
> + /**
> + * @ping_delay: Period for polling the GT timestamp for
> + * overflow.
> + */
> + unsigned long ping_delay;
> +
> + /**
> + * @work: Periodic work to adjust GT timestamp, engine and
> + * context usage for overflows.
> + */
> + struct delayed_work work;
> + } timestamp;
> };
>
> static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index 621c893a009f..1a1edae67e4e 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -26,6 +26,8 @@
> * | guc_policies |
> * +---------------------------------------+
> * | guc_gt_system_info |
> + * +---------------------------------------+
> + * | guc_engine_usage |
> * +---------------------------------------+ <== static
> * | guc_mmio_reg[countA] (engine 0.0) |
> * | guc_mmio_reg[countB] (engine 0.1) |
> @@ -47,6 +49,7 @@ struct __guc_ads_blob {
> struct guc_ads ads;
> struct guc_policies policies;
> struct guc_gt_system_info system_info;
> + struct guc_engine_usage engine_usage;
> /* From here on, location is dynamic! Refer to above diagram. */
> struct guc_mmio_reg regset[0];
> } __packed;
> @@ -628,3 +631,21 @@ void intel_guc_ads_reset(struct intel_guc *guc)
>
> guc_ads_private_data_reset(guc);
> }
> +
> +u32 intel_guc_engine_usage_offset(struct intel_guc *guc)
> +{
> + struct __guc_ads_blob *blob = guc->ads_blob;
> + u32 base = intel_guc_ggtt_offset(guc, guc->ads_vma);
> + u32 offset = base + ptr_offset(blob, engine_usage);
> +
> + return offset;
> +}
> +
> +struct guc_engine_usage_record *intel_guc_engine_usage(struct intel_engine_cs *engine)
> +{
> + struct intel_guc *guc = &engine->gt->uc.guc;
> + struct __guc_ads_blob *blob = guc->ads_blob;
> + u8 guc_class = engine_class_to_guc_class(engine->class);
> +
> + return &blob->engine_usage.engines[guc_class][ilog2(engine->logical_mask)];
> +}
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> index 3d85051d57e4..e74c110facff 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> @@ -6,8 +6,11 @@
> #ifndef _INTEL_GUC_ADS_H_
> #define _INTEL_GUC_ADS_H_
>
> +#include <linux/types.h>
> +
> struct intel_guc;
> struct drm_printer;
> +struct intel_engine_cs;
>
> int intel_guc_ads_create(struct intel_guc *guc);
> void intel_guc_ads_destroy(struct intel_guc *guc);
> @@ -15,5 +18,7 @@ void intel_guc_ads_init_late(struct intel_guc *guc);
> void intel_guc_ads_reset(struct intel_guc *guc);
> void intel_guc_ads_print_policy_info(struct intel_guc *guc,
> struct drm_printer *p);
> +struct guc_engine_usage_record *intel_guc_engine_usage(struct intel_engine_cs *engine);
> +u32 intel_guc_engine_usage_offset(struct intel_guc *guc);
>
> #endif
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> index 722933e26347..7072e30e99f4 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> @@ -294,6 +294,19 @@ struct guc_ads {
> u32 reserved[15];
> } __packed;
>
> +/* Engine usage stats */
> +struct guc_engine_usage_record {
> + u32 current_context_index;
> + u32 last_switch_in_stamp;
> + u32 reserved0;
> + u32 total_runtime;
> + u32 reserved1[4];
> +} __packed;
> +
> +struct guc_engine_usage {
> + struct guc_engine_usage_record engines[GUC_MAX_ENGINE_CLASSES][GUC_MAX_INSTANCES_PER_CLASS];
Again like I mentioned in the previous patch, I'd define this
sub-structure inline. But that is just my opinion and doesn't really
matter. I believe I understand everything else this patch is doing and
it looks good to me.
With that:
Reviewed-by: Matthew Brost <matthew.brost at intel.com>
> +} __packed;
> +
> /* GuC logging structures */
>
> enum guc_log_buffer_type {
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 38b47e73e35d..5cc49c0b3889 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -13,6 +13,7 @@
> #include "gt/intel_engine_heartbeat.h"
> #include "gt/intel_gpu_commands.h"
> #include "gt/intel_gt.h"
> +#include "gt/intel_gt_clock_utils.h"
> #include "gt/intel_gt_irq.h"
> #include "gt/intel_gt_pm.h"
> #include "gt/intel_gt_requests.h"
> @@ -21,6 +22,7 @@
> #include "gt/intel_mocs.h"
> #include "gt/intel_ring.h"
>
> +#include "intel_guc_ads.h"
> #include "intel_guc_submission.h"
>
> #include "i915_drv.h"
> @@ -1077,6 +1079,272 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
> xa_unlock_irqrestore(&guc->context_lookup, flags);
> }
>
> +/*
> + * GuC stores busyness stats for each engine at context in/out boundaries. A
> + * context 'in' logs execution start time, 'out' adds in -> out delta to total.
> + * i915/kmd accesses 'start', 'total' and 'context id' from memory shared with
> + * GuC.
> + *
> + * __i915_pmu_event_read samples engine busyness. When sampling, if context id
> + * is valid (!= ~0) and start is non-zero, the engine is considered to be
> + * active. For an active engine total busyness = total + (now - start), where
> + * 'now' is the time at which the busyness is sampled. For inactive engine,
> + * total busyness = total.
> + *
> + * All times are captured from GUCPMTIMESTAMP reg and are in gt clock domain.
> + *
> + * The start and total values provided by GuC are 32 bits and wrap around in a
> + * few minutes. Since perf pmu provides busyness as 64 bit monotonically
> + * increasing ns values, there is a need for this implementation to account for
> + * overflows and extend the GuC provided values to 64 bits before returning
> + * busyness to the user. In order to do that, a worker runs periodically at
> + * frequency = 1/8th the time it takes for the timestamp to wrap (i.e. once in
> + * 27 seconds for a gt clock frequency of 19.2 MHz).
> + */
> +
> +#define WRAP_TIME_CLKS U32_MAX
> +#define POLL_TIME_CLKS (WRAP_TIME_CLKS >> 3)
> +
> +static void
> +__extend_last_switch(struct intel_guc *guc, u64 *prev_start, u32 new_start)
> +{
> + u32 gt_stamp_hi = upper_32_bits(guc->timestamp.gt_stamp);
> + u32 gt_stamp_last = lower_32_bits(guc->timestamp.gt_stamp);
> +
> + if (new_start == lower_32_bits(*prev_start))
> + return;
> +
> + if (new_start < gt_stamp_last &&
> + (new_start - gt_stamp_last) <= POLL_TIME_CLKS)
> + gt_stamp_hi++;
> +
> + if (new_start > gt_stamp_last &&
> + (gt_stamp_last - new_start) <= POLL_TIME_CLKS && gt_stamp_hi)
> + gt_stamp_hi--;
> +
> + *prev_start = ((u64)gt_stamp_hi << 32) | new_start;
> +}
> +
> +static void guc_update_engine_gt_clks(struct intel_engine_cs *engine)
> +{
> + struct guc_engine_usage_record *rec = intel_guc_engine_usage(engine);
> + struct intel_engine_guc_stats *stats = &engine->stats.guc;
> + struct intel_guc *guc = &engine->gt->uc.guc;
> + u32 last_switch = rec->last_switch_in_stamp;
> + u32 ctx_id = rec->current_context_index;
> + u32 total = rec->total_runtime;
> +
> + lockdep_assert_held(&guc->timestamp.lock);
> +
> + stats->running = ctx_id != ~0U && last_switch;
> + if (stats->running)
> + __extend_last_switch(guc, &stats->start_gt_clk, last_switch);
> +
> + /*
> + * Instead of adjusting the total for overflow, just add the
> + * difference from previous sample stats->total_gt_clks
> + */
> + if (total && total != ~0U) {
> + stats->total_gt_clks += (u32)(total - stats->prev_total);
> + stats->prev_total = total;
> + }
> +}
> +
> +static void guc_update_pm_timestamp(struct intel_guc *guc,
> + struct intel_engine_cs *engine,
> + ktime_t *now)
> +{
> + u32 gt_stamp_now, gt_stamp_hi;
> +
> + lockdep_assert_held(&guc->timestamp.lock);
> +
> + gt_stamp_hi = upper_32_bits(guc->timestamp.gt_stamp);
> + gt_stamp_now = intel_uncore_read(engine->uncore,
> + RING_TIMESTAMP(engine->mmio_base));
> + *now = ktime_get();
> +
> + if (gt_stamp_now < lower_32_bits(guc->timestamp.gt_stamp))
> + gt_stamp_hi++;
> +
> + guc->timestamp.gt_stamp = ((u64)gt_stamp_hi << 32) | gt_stamp_now;
> +}
> +
> +/*
> + * Unlike the execlist mode of submission total and active times are in terms of
> + * gt clocks. The *now parameter is retained to return the cpu time at which the
> + * busyness was sampled.
> + */
> +static ktime_t guc_engine_busyness(struct intel_engine_cs *engine, ktime_t *now)
> +{
> + struct intel_engine_guc_stats stats_saved, *stats = &engine->stats.guc;
> + struct i915_gpu_error *gpu_error = &engine->i915->gpu_error;
> + struct intel_gt *gt = engine->gt;
> + struct intel_guc *guc = >->uc.guc;
> + u64 total, gt_stamp_saved;
> + unsigned long flags;
> + u32 reset_count;
> +
> + spin_lock_irqsave(&guc->timestamp.lock, flags);
> +
> + /*
> + * If a reset happened, we risk reading partially updated
> + * engine busyness from GuC, so we just use the driver stored
> + * copy of busyness. Synchronize with gt reset using reset_count.
> + */
> + reset_count = i915_reset_count(gpu_error);
> +
> + *now = ktime_get();
> +
> + /*
> + * The active busyness depends on start_gt_clk and gt_stamp.
> + * gt_stamp is updated by i915 only when gt is awake and the
> + * start_gt_clk is derived from GuC state. To get a consistent
> + * view of activity, we query the GuC state only if gt is awake.
> + */
> + stats_saved = *stats;
> + gt_stamp_saved = guc->timestamp.gt_stamp;
> + if (intel_gt_pm_get_if_awake(gt)) {
> + guc_update_engine_gt_clks(engine);
> + guc_update_pm_timestamp(guc, engine, now);
> + intel_gt_pm_put_async(gt);
> + if (i915_reset_count(gpu_error) != reset_count) {
> + *stats = stats_saved;
> + guc->timestamp.gt_stamp = gt_stamp_saved;
> + }
> + }
> +
> + total = intel_gt_clock_interval_to_ns(gt, stats->total_gt_clks);
> + if (stats->running) {
> + u64 clk = guc->timestamp.gt_stamp - stats->start_gt_clk;
> +
> + total += intel_gt_clock_interval_to_ns(gt, clk);
> + }
> +
> + spin_unlock_irqrestore(&guc->timestamp.lock, flags);
> +
> + return ns_to_ktime(total);
> +}
> +
> +static void __reset_guc_busyness_stats(struct intel_guc *guc)
> +{
> + struct intel_gt *gt = guc_to_gt(guc);
> + struct intel_engine_cs *engine;
> + enum intel_engine_id id;
> + unsigned long flags;
> + ktime_t unused;
> +
> + cancel_delayed_work_sync(&guc->timestamp.work);
> +
> + spin_lock_irqsave(&guc->timestamp.lock, flags);
> +
> + for_each_engine(engine, gt, id) {
> + guc_update_pm_timestamp(guc, engine, &unused);
> + guc_update_engine_gt_clks(engine);
> + engine->stats.guc.prev_total = 0;
> + }
> +
> + spin_unlock_irqrestore(&guc->timestamp.lock, flags);
> +}
> +
> +static void __update_guc_busyness_stats(struct intel_guc *guc)
> +{
> + struct intel_gt *gt = guc_to_gt(guc);
> + struct intel_engine_cs *engine;
> + enum intel_engine_id id;
> + ktime_t unused;
> +
> + for_each_engine(engine, gt, id) {
> + guc_update_pm_timestamp(guc, engine, &unused);
> + guc_update_engine_gt_clks(engine);
> + }
> +}
> +
> +static void guc_timestamp_ping(struct work_struct *wrk)
> +{
> + struct intel_guc *guc = container_of(wrk, typeof(*guc),
> + timestamp.work.work);
> + struct intel_uc *uc = container_of(guc, typeof(*uc), guc);
> + struct intel_gt *gt = guc_to_gt(guc);
> + intel_wakeref_t wakeref;
> + unsigned long flags;
> + int srcu, ret;
> +
> + /*
> + * Synchronize with gt reset to make sure the worker does not
> + * corrupt the engine/guc stats.
> + */
> + ret = intel_gt_reset_trylock(gt, &srcu);
> + if (ret)
> + return;
> +
> + spin_lock_irqsave(&guc->timestamp.lock, flags);
> +
> + with_intel_runtime_pm(>->i915->runtime_pm, wakeref)
> + __update_guc_busyness_stats(guc);
> +
> + spin_unlock_irqrestore(&guc->timestamp.lock, flags);
> +
> + intel_gt_reset_unlock(gt, srcu);
> +
> + mod_delayed_work(system_highpri_wq, &guc->timestamp.work,
> + guc->timestamp.ping_delay);
> +}
> +
> +static int guc_action_enable_usage_stats(struct intel_guc *guc)
> +{
> + u32 offset = intel_guc_engine_usage_offset(guc);
> + u32 action[] = {
> + INTEL_GUC_ACTION_SET_ENG_UTIL_BUFF,
> + offset,
> + 0,
> + };
> +
> + return intel_guc_send(guc, action, ARRAY_SIZE(action));
> +}
> +
> +static void guc_init_engine_stats(struct intel_guc *guc)
> +{
> + struct intel_gt *gt = guc_to_gt(guc);
> + intel_wakeref_t wakeref;
> +
> + mod_delayed_work(system_highpri_wq, &guc->timestamp.work,
> + guc->timestamp.ping_delay);
> +
> + with_intel_runtime_pm(>->i915->runtime_pm, wakeref) {
> + int ret = guc_action_enable_usage_stats(guc);
> +
> + if (ret)
> + drm_err(>->i915->drm,
> + "Failed to enable usage stats: %d!\n", ret);
> + }
> +}
> +
> +void intel_guc_busyness_park(struct intel_gt *gt)
> +{
> + struct intel_guc *guc = >->uc.guc;
> + unsigned long flags;
> +
> + if (!guc_submission_initialized(guc))
> + return;
> +
> + cancel_delayed_work(&guc->timestamp.work);
> +
> + spin_lock_irqsave(&guc->timestamp.lock, flags);
> + __update_guc_busyness_stats(guc);
> + spin_unlock_irqrestore(&guc->timestamp.lock, flags);
> +}
> +
> +void intel_guc_busyness_unpark(struct intel_gt *gt)
> +{
> + struct intel_guc *guc = >->uc.guc;
> +
> + if (!guc_submission_initialized(guc))
> + return;
> +
> + mod_delayed_work(system_highpri_wq, &guc->timestamp.work,
> + guc->timestamp.ping_delay);
> +}
> +
> static inline bool
> submission_disabled(struct intel_guc *guc)
> {
> @@ -1138,6 +1406,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc)
> intel_gt_park_heartbeats(guc_to_gt(guc));
> disable_submission(guc);
> guc->interrupts.disable(guc);
> + __reset_guc_busyness_stats(guc);
>
> /* Flush IRQ handler */
> spin_lock_irq(&guc_to_gt(guc)->irq_lock);
> @@ -1484,6 +1753,7 @@ static void destroyed_worker_func(struct work_struct *w);
> */
> int intel_guc_submission_init(struct intel_guc *guc)
> {
> + struct intel_gt *gt = guc_to_gt(guc);
> int ret;
>
> if (guc->lrc_desc_pool)
> @@ -1512,6 +1782,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
> if (!guc->submission_state.guc_ids_bitmap)
> return -ENOMEM;
>
> + spin_lock_init(&guc->timestamp.lock);
> + INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping);
> + guc->timestamp.ping_delay = (POLL_TIME_CLKS / gt->clock_frequency + 1) * HZ;
> +
> return 0;
> }
>
> @@ -3369,7 +3643,9 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
> engine->emit_flush = gen12_emit_flush_xcs;
> }
> engine->set_default_submission = guc_set_default_submission;
> + engine->busyness = guc_engine_busyness;
>
> + engine->flags |= I915_ENGINE_SUPPORTS_STATS;
> engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> engine->flags |= I915_ENGINE_HAS_TIMESLICES;
>
> @@ -3468,6 +3744,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> void intel_guc_submission_enable(struct intel_guc *guc)
> {
> guc_init_lrc_mapping(guc);
> + guc_init_engine_stats(guc);
> }
>
> void intel_guc_submission_disable(struct intel_guc *guc)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> index c7ef44fa0c36..5a95a9f0a8e3 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> @@ -28,6 +28,8 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
> void intel_guc_dump_active_requests(struct intel_engine_cs *engine,
> struct i915_request *hung_rq,
> struct drm_printer *m);
> +void intel_guc_busyness_park(struct intel_gt *gt);
> +void intel_guc_busyness_unpark(struct intel_gt *gt);
>
> bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
>
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index d9f7a729333f..f7927f6dac6e 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -2662,6 +2662,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
> #define RING_WAIT (1 << 11) /* gen3+, PRBx_CTL */
> #define RING_WAIT_SEMAPHORE (1 << 10) /* gen6+ */
>
> +#define GUCPMTIMESTAMP _MMIO(0xC3E8)
> +
> /* There are 16 64-bit CS General Purpose Registers per-engine on Gen8+ */
> #define GEN8_RING_CS_GPR(base, n) _MMIO((base) + 0x600 + (n) * 8)
> #define GEN8_RING_CS_GPR_UDW(base, n) _MMIO((base) + 0x600 + (n) * 8 + 4)
> --
> 2.20.1
>
More information about the Intel-gfx
mailing list