[Intel-gfx] [RFC 4/6] drm/i915/pmu: Add queued counter

Wed Jan 24 18:01:59 UTC 2018

On 22/01/2018 18:56, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-01-22 18:43:56)
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> We add a PMU counter to expose the number of requests which have been
>> submitted from userspace but are not yet runnable due dependencies and
>> unsignaled fences.
>>
>> This is useful to analyze the overall load of the system.
>>
>> v2:
>>   * Rebase for name change and re-order.
>>   * Drop floating point constant. (Chris Wilson)
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_pmu.c         | 40 +++++++++++++++++++++++++++++----
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
>>   include/uapi/drm/i915_drm.h             |  9 +++++++-
>>   3 files changed, 45 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>> index cbfca4a255ab..8eefdf09a30a 100644
>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>> @@ -36,7 +36,8 @@
>>   #define ENGINE_SAMPLE_MASK \
>>          (BIT(I915_SAMPLE_BUSY) | \
>>           BIT(I915_SAMPLE_WAIT) | \
>> -        BIT(I915_SAMPLE_SEMA))
>> +        BIT(I915_SAMPLE_SEMA) | \
>> +        BIT(I915_SAMPLE_QUEUED))
>>   
>>   #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
>>   
>> @@ -220,6 +221,11 @@ static void engines_sample(struct drm_i915_private *dev_priv)
>>   
>>                  update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
>>                                PERIOD, !!(val & RING_WAIT_SEMAPHORE));
>> +
>> +               if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED))
>> +                       update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED],
>> +                                     I915_SAMPLE_QUEUED_DIVISOR,
>> +                                     atomic_read(&engine->request_stats.queued));
> 
> engine->request_stats.foo works for me, and reads quite nicely.
> 
>> +/* No brackets or quotes below please. */
>> +#define I915_SAMPLE_QUEUED_SCALE 0.01
> 
>> + /* Divide counter value by divisor to get the real value. */
>> +#define I915_SAMPLE_QUEUED_DIVISOR (100)
> 
> I'm just thinking of favouring the sampler arithmetic by using 128. As
> far as userspace the difference is not going to that noticeable, less if
> you chose 256.

I'll do 1024 then, but the CPU usage in the sampling thread is so low 
anyway.

Regards,

Tvrtko