[Intel-gfx] [RFC 4/6] drm/i915/pmu: Add queued counter
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Jan 24 18:01:59 UTC 2018
On 22/01/2018 18:56, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-01-22 18:43:56)
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> We add a PMU counter to expose the number of requests which have been
>> submitted from userspace but are not yet runnable due dependencies and
>> unsignaled fences.
>>
>> This is useful to analyze the overall load of the system.
>>
>> v2:
>> * Rebase for name change and re-order.
>> * Drop floating point constant. (Chris Wilson)
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_pmu.c | 40 +++++++++++++++++++++++++++++----
>> drivers/gpu/drm/i915/intel_ringbuffer.h | 2 +-
>> include/uapi/drm/i915_drm.h | 9 +++++++-
>> 3 files changed, 45 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>> index cbfca4a255ab..8eefdf09a30a 100644
>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>> @@ -36,7 +36,8 @@
>> #define ENGINE_SAMPLE_MASK \
>> (BIT(I915_SAMPLE_BUSY) | \
>> BIT(I915_SAMPLE_WAIT) | \
>> - BIT(I915_SAMPLE_SEMA))
>> + BIT(I915_SAMPLE_SEMA) | \
>> + BIT(I915_SAMPLE_QUEUED))
>>
>> #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
>>
>> @@ -220,6 +221,11 @@ static void engines_sample(struct drm_i915_private *dev_priv)
>>
>> update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
>> PERIOD, !!(val & RING_WAIT_SEMAPHORE));
>> +
>> + if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED))
>> + update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED],
>> + I915_SAMPLE_QUEUED_DIVISOR,
>> + atomic_read(&engine->request_stats.queued));
>
> engine->request_stats.foo works for me, and reads quite nicely.
>
>> +/* No brackets or quotes below please. */
>> +#define I915_SAMPLE_QUEUED_SCALE 0.01
>
>> + /* Divide counter value by divisor to get the real value. */
>> +#define I915_SAMPLE_QUEUED_DIVISOR (100)
>
> I'm just thinking of favouring the sampler arithmetic by using 128. As
> far as userspace the difference is not going to that noticeable, less if
> you chose 256.
I'll do 1024 then, but the CPU usage in the sampling thread is so low
anyway.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list