[Intel-gfx] [PATCH 6/7] drm/i915/pmu: Add running counter
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Jun 6 15:52:17 UTC 2018
On 06/06/2018 16:23, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-06 15:40:10)
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> We add a PMU counter to expose the number of requests currently executing
>> on the GPU.
>>
>> This is useful to analyze the overall load of the system.
>>
>> v2:
>> * Rebase.
>> * Drop floating point constant. (Chris Wilson)
>>
>> v3:
>> * Change scale to 1024 for faster arithmetics. (Chris Wilson)
>>
>> v4:
>> * Refactored for timer period accounting.
>>
>> v5:
>> * Avoid 64-division. (Chris Wilson)
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> ---
>> #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
>>
>> @@ -226,6 +227,13 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
>> div_u64((u64)period_ns *
>> I915_SAMPLE_QUEUED_DIVISOR,
>> 1000000));
>> +
>> + if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNING))
>> + add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNING],
>> + last_seqno - current_seqno,
>> + div_u64((u64)period_ns *
>> + I915_SAMPLE_QUEUED_DIVISOR,
>> + 1000000));
>
> Are we worried about losing precision with qd.ns?
>
> add_sample_mult(SAMPLE, x, period_ns); here
>
>> @@ -560,7 +569,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
>> val = engine->pmu.sample[sample].cur;
>>
>> if (sample == I915_SAMPLE_QUEUED ||
>> - sample == I915_SAMPLE_RUNNABLE)
>> + sample == I915_SAMPLE_RUNNABLE ||
>> + sample == I915_SAMPLE_RUNNING)
>> val = div_u64(val, MSEC_PER_SEC); /* to qd */
>
> and val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR, NSEC_PER_SEC);
Yeah that works, thanks.
> So that gives us a limit of ~1 million qd (assuming the user cares for
> about 1s intervals). Up to 8 million wlog with
>
> val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR/8, NSEC_PER_SEC/8);
Or keep in qd.us as for frequency. I think precision is plenty in any case.
> Anyway, just concerned to have more than one 64b division and want to
> provoke you into thinking of a way of avoiding it :)
It is an optimized 64-bit divide, or 64-divide as I faltered in the
commit message :), so not as bad as 64/64, but still your idea is very good.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list