[Intel-gfx] [PATCH 02/10] drm/i915/pmu: Expose a PMU interface for perf queries

Chris Wilson chris at chris-wilson.co.uk
Fri Sep 29 12:46:00 UTC 2017


Quoting Tvrtko Ursulin (2017-09-29 13:34:52)
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> From: Chris Wilson <chris at chris-wilson.co.uk>
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> From: Dmitry Rogozhkin <dmitry.v.rogozhkin at intel.com>
> 
> The first goal is to be able to measure GPU (and invidual ring) busyness
> without having to poll registers from userspace. (Which not only incurs
> holding the forcewake lock indefinitely, perturbing the system, but also
> runs the risk of hanging the machine.) As an alternative we can use the
> perf event counter interface to sample the ring registers periodically
> and send those results to userspace.
> 
> Functionality we are exporting to userspace is via the existing perf PMU
> API and can be exercised via the existing tools. For example:
> 
>   perf stat -a -e i915/rcs0-busy/ -I 1000
> 
> Will print the render engine busynnes once per second. All the performance
> counters can be enumerated (perf list) and have their unit of measure
> correctly reported in sysfs.
> 
> v1-v2 (Chris Wilson):
> 
> v2: Use a common timer for the ring sampling.
> 
> v3: (Tvrtko Ursulin)
>  * Decouple uAPI from i915 engine ids.
>  * Complete uAPI defines.
>  * Refactor some code to helpers for clarity.
>  * Skip sampling disabled engines.
>  * Expose counters in sysfs.
>  * Pass in fake regs to avoid null ptr deref in perf core.
>  * Convert to class/instance uAPI.
>  * Use shared driver code for rc6 residency, power and frequency.
> 
> v4: (Dmitry Rogozhkin)
>  * Register PMU with .task_ctx_nr=perf_invalid_context
>  * Expose cpumask for the PMU with the single CPU in the mask
>  * Properly support pmu->stop(): it should call pmu->read()
>  * Properly support pmu->del(): it should call stop(event, PERF_EF_UPDATE)
>  * Introduce refcounting of event subscriptions.
>  * Make pmu.busy_stats a refcounter to avoid busy stats going away
>    with some deleted event.
>  * Expose cpumask for i915 PMU to avoid multiple events creation of
>    the same type followed by counter aggregation by perf-stat.
>  * Track CPUs getting online/offline to migrate perf context. If (likely)
>    cpumask will initially set CPU0, CONFIG_BOOTPARAM_HOTPLUG_CPU0 will be
>    needed to see effect of CPU status tracking.
>  * End result is that only global events are supported and perf stat
>    works correctly.
>  * Deny perf driver level sampling - it is prohibited for uncore PMU.
> 
> v5: (Tvrtko Ursulin)
> 
>  * Don't hardcode number of engine samplers.
>  * Rewrite event ref-counting for correctness and simplicity.
>  * Store initial counter value when starting already enabled events
>    to correctly report values to all listeners.
>  * Fix RC6 residency readout.
>  * Comments, GPL header.
> 
> v6:
>  * Add missing entry to v4 changelog.
>  * Fix accounting in CPU hotplug case by copying the approach from
>    arch/x86/events/intel/cstate.c. (Dmitry Rogozhkin)
> 
> v7:
>  * Log failure message only on failure.
>  * Remove CPU hotplug notification state on unregister.
> 
> v8:
>  * Fix error unwind on failed registration.
>  * Checkpatch cleanup.
> 
> v9:
>  * Drop the energy metric, it is available via intel_rapl_perf.
>    (Ville Syrjälä)
>  * Use HAS_RC6(p). (Chris Wilson)
>  * Handle unsupported non-engine events. (Dmitry Rogozhkin)
>  * Rebase for intel_rc6_residency_ns needing caller managed
>    runtime pm.
>  * Drop HAS_RC6 checks from the read callback since creating those
>    events will be rejected at init time already.
>  * Add counter units to sysfs so perf stat output is nicer.
>  * Cleanup the attribute tables for brevity and readability.
> 
> v10:
>  * Fixed queued accounting.
> 
> v11:
>  * Move intel_engine_lookup_user to intel_engine_cs.c
>  * Commit update. (Joonas Lahtinen)
> 
> v12:
>  * More accurate sampling. (Chris Wilson)
>  * Store and report frequency in MHz for better usability from
>    perf stat.
>  * Removed metrics: queued, interrupts, rc6 counters.
>  * Sample engine busyness based on seqno difference only
>    for less MMIO (and forcewake) on all platforms. (Chris Wilson)
> 
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin at intel.com>
> Cc: Peter Zijlstra <peterz at infradead.org>
> ---
> +static void
> +update_sample(struct i915_pmu_sample *sample, u32 unit, u32 val)
> +{
> +       /*
> +        * Since we are doing stohastical sampling for these counter,

stochastic

> +        * average the delta with the previous value for better accuracy.
> +        */
> +       sample->cur += div_u64((u64)(sample->prev + val) * unit, 2);

div_u64(mul_32_32(sample->prev + val, unit), 2)

> +       sample->prev = val;
> +}
> +
> +static void engines_sample(struct drm_i915_private *dev_priv)
> +{
> +       struct intel_engine_cs *engine;
> +       enum intel_engine_id id;
> +       bool fw = false;
> +
> +       if ((dev_priv->pmu.enable & ENGINE_SAMPLE_MASK) == 0)
> +               return;
> +
> +       if (!dev_priv->gt.awake)
> +               return;
> +
> +       if (!intel_runtime_pm_get_if_in_use(dev_priv))
> +               return;
> +
> +       for_each_engine(engine, dev_priv, id) {
> +               u32 current_seqno = intel_engine_get_seqno(engine);
> +               u32 last_seqno = intel_engine_last_submit(engine);
> +               u32 val;
> +
> +               val = !i915_seqno_passed(current_seqno, last_seqno);
> +
> +               update_sample(&engine->pmu.sample[I915_SAMPLE_BUSY], PERIOD,
> +                             val);

Personally I think
	update_sample(&engine->pmu.sample[I915_SAMPLE_BUSY],
		      PERIOD, val);
carries better visual weighting.

> +
> +               if (val && (engine->pmu.enable &
> +                   (BIT(I915_SAMPLE_WAIT) | BIT(I915_SAMPLE_SEMA)))) {
> +                       fw = grab_forcewake(dev_priv, fw);
> +
> +                       val = I915_READ_FW(RING_CTL(engine->mmio_base));
> +               } else {
> +                       val = 0;
> +               }
> +
> +               update_sample(&engine->pmu.sample[I915_SAMPLE_WAIT], PERIOD,
> +                             !!(val & RING_WAIT));
> +
> +               update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA], PERIOD,
> +                             !!(val & RING_WAIT_SEMAPHORE));
> +       }
> +
> +       if (fw)
> +               intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> +
> +       intel_runtime_pm_put(dev_priv);

Ok.

> +}
> +
> +static void frequency_sample(struct drm_i915_private *dev_priv)
> +{
> +       if (dev_priv->pmu.enable &
> +           config_enabled_mask(I915_PMU_ACTUAL_FREQUENCY)) {
> +               u32 val;
> +
> +               val = dev_priv->rps.cur_freq;
> +               if (dev_priv->gt.awake &&
> +                   intel_runtime_pm_get_if_in_use(dev_priv)) {
> +                       val = intel_get_cagf(dev_priv,
> +                                            I915_READ_NOTRACE(GEN6_RPSTAT1));
> +                       intel_runtime_pm_put(dev_priv);
> +               }
> +
> +               update_sample(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_ACT], 1,
> +                             intel_gpu_freq(dev_priv, val));
> +       }
> +
> +       if (dev_priv->pmu.enable &
> +           config_enabled_mask(I915_PMU_REQUESTED_FREQUENCY)) {
> +               update_sample(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_REQ], 1,
> +                             intel_gpu_freq(dev_priv, dev_priv->rps.cur_freq));
> +       }
Ok.
> +}

Looks good as the introductory set of counters. I trust the improvements
to the perf_event integration are good (as you can tell from the earlier
patch, I have no idea what I'm doing there ;)

Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
-Chris



More information about the Intel-gfx mailing list