[Intel-gfx] [PATCH 6/7] drm/i915/pmu: Add running counter
Chris Wilson
chris at chris-wilson.co.uk
Wed Jun 6 15:23:55 UTC 2018
Quoting Tvrtko Ursulin (2018-06-06 15:40:10)
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>
> We add a PMU counter to expose the number of requests currently executing
> on the GPU.
>
> This is useful to analyze the overall load of the system.
>
> v2:
> * Rebase.
> * Drop floating point constant. (Chris Wilson)
>
> v3:
> * Change scale to 1024 for faster arithmetics. (Chris Wilson)
>
> v4:
> * Refactored for timer period accounting.
>
> v5:
> * Avoid 64-division. (Chris Wilson)
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
> #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
>
> @@ -226,6 +227,13 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
> div_u64((u64)period_ns *
> I915_SAMPLE_QUEUED_DIVISOR,
> 1000000));
> +
> + if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNING))
> + add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNING],
> + last_seqno - current_seqno,
> + div_u64((u64)period_ns *
> + I915_SAMPLE_QUEUED_DIVISOR,
> + 1000000));
Are we worried about losing precision with qd.ns?
add_sample_mult(SAMPLE, x, period_ns); here
> @@ -560,7 +569,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
> val = engine->pmu.sample[sample].cur;
>
> if (sample == I915_SAMPLE_QUEUED ||
> - sample == I915_SAMPLE_RUNNABLE)
> + sample == I915_SAMPLE_RUNNABLE ||
> + sample == I915_SAMPLE_RUNNING)
> val = div_u64(val, MSEC_PER_SEC); /* to qd */
and val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR, NSEC_PER_SEC);
So that gives us a limit of ~1 million qd (assuming the user cares for
about 1s intervals). Up to 8 million wlog with
val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR/8, NSEC_PER_SEC/8);
Anyway, just concerned to have more than one 64b division and want to
provoke you into thinking of a way of avoiding it :)
-Chris
More information about the Intel-gfx
mailing list