[Intel-gfx] [PATCH 6/7] drm/i915/pmu: Add running counter

Chris Wilson chris at chris-wilson.co.uk
Wed Jun 6 15:23:55 UTC 2018


Quoting Tvrtko Ursulin (2018-06-06 15:40:10)
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> We add a PMU counter to expose the number of requests currently executing
> on the GPU.
> 
> This is useful to analyze the overall load of the system.
> 
> v2:
>  * Rebase.
>  * Drop floating point constant. (Chris Wilson)
> 
> v3:
>  * Change scale to 1024 for faster arithmetics. (Chris Wilson)
> 
> v4:
>  * Refactored for timer period accounting.
> 
> v5:
>  * Avoid 64-division. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
>  #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
>  
> @@ -226,6 +227,13 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
>                                         div_u64((u64)period_ns *
>                                                 I915_SAMPLE_QUEUED_DIVISOR,
>                                                 1000000));
> +
> +               if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNING))
> +                       add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNING],
> +                                       last_seqno - current_seqno,
> +                                       div_u64((u64)period_ns *
> +                                               I915_SAMPLE_QUEUED_DIVISOR,
> +                                               1000000));

Are we worried about losing precision with qd.ns?

add_sample_mult(SAMPLE, x, period_ns); here

> @@ -560,7 +569,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
>                         val = engine->pmu.sample[sample].cur;
>  
>                         if (sample == I915_SAMPLE_QUEUED ||
> -                           sample == I915_SAMPLE_RUNNABLE)
> +                           sample == I915_SAMPLE_RUNNABLE ||
> +                           sample == I915_SAMPLE_RUNNING)
>                                 val = div_u64(val, MSEC_PER_SEC);  /* to qd */

and val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR, NSEC_PER_SEC);

So that gives us a limit of ~1 million qd (assuming the user cares for
about 1s intervals). Up to 8 million wlog with

	val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR/8, NSEC_PER_SEC/8);

Anyway, just concerned to have more than one 64b division and want to
provoke you into thinking of a way of avoiding it :)
-Chris


More information about the Intel-gfx mailing list