[Intel-gfx] [RFC 07/11] drm/i915: Engine busy time tracking

Fri Sep 15 09:45:18 UTC 2017

On 14/09/2017 21:16, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2017-09-11 16:25:55)
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> Track total time requests have been executing on the hardware.
>>
>> We add new kernel API to allow software tracking of time GPU
>> engines are spending executing requests.
>>
>> Both per-engine and global API is added with the latter also
>> being exported for use by external users.
>>
>> v2:
>>   * Squashed with the internal API.
>>   * Dropped static key.
>>   * Made per-engine.
>>   * Store time in monotonic ktime.
>>
>> v3: Moved stats clearing to disable.
>>
>> v4:
>>   * Comments.
>>   * Don't export the API just yet.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> ---
>>   drivers/gpu/drm/i915/intel_engine_cs.c  | 141 ++++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.c        |   2 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  81 ++++++++++++++++++
>>   3 files changed, 224 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
>> index dbc7abd65f33..f7dba176989c 100644
>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
>> @@ -232,6 +232,8 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
>>          /* Nothing to do here, execute in order of dependencies */
>>          engine->schedule = NULL;
>>   
>> +       spin_lock_init(&engine->stats.lock);
>> +
>>          ATOMIC_INIT_NOTIFIER_HEAD(&engine->context_status_notifier);
>>   
>>          dev_priv->engine_class[info->class][info->instance] = engine;
>> @@ -1417,6 +1419,145 @@ bool intel_engine_can_store_dword(struct intel_engine_cs *engine)
>>          }
>>   }
>>   
>> +/**
>> + * intel_enable_engine_stats() - Enable engine busy tracking on engine
>> + * @engine: engine to enable stats collection
>> + *
>> + * Start collecting the engine busyness data for @engine.
>> + *
>> + * Returns 0 on success or a negative error code.
>> + */
>> +int intel_enable_engine_stats(struct intel_engine_cs *engine)
>> +{
>> +       unsigned long flags;
>> +
>> +       if (!i915.enable_execlists)
>> +               return -ENODEV;
>> +
>> +       spin_lock_irqsave(&engine->stats.lock, flags);
>> +       if (engine->stats.enabled == ~0)
>> +               goto busy;
>> +       engine->stats.enabled++;
>> +       spin_unlock_irqrestore(&engine->stats.lock, flags);
>> +
>> +       return 0;
>> +
>> +busy:
>> +       spin_unlock_irqrestore(&engine->stats.lock, flags);
>> +
>> +       return -EBUSY;
>> +}
>> +
>> +/**
>> + * intel_disable_engine_stats() - Disable engine busy tracking on engine
>> + * @engine: engine to disable stats collection
>> + *
>> + * Stops collecting the engine busyness data for @engine.
>> + */
>> +void intel_disable_engine_stats(struct intel_engine_cs *engine)
>> +{
>> +       unsigned long flags;
>> +
>> +       if (!i915.enable_execlists)
>> +               return;
>> +
>> +       spin_lock_irqsave(&engine->stats.lock, flags);
>> +       WARN_ON_ONCE(engine->stats.enabled == 0);
>> +       if (--engine->stats.enabled == 0) {
> 
> Saturation protection on inc, but not on dec?

On inc since it can imaginably be triggered from the outside, even 
userspace via PMU API, but dec has a WARN_ON_ONCE only with the thinking 
it can only be a programming error to hit that one.

> You might as well just use refcount_t.

I dislike needless atomics inside atomics, since I need the spinlock anyway.

>> +               engine->stats.ref = 0;
>> +               engine->stats.start = engine->stats.total = 0;
>> +       }
>> +       spin_unlock_irqrestore(&engine->stats.lock, flags);
>> +}
>> +
>> +/**
>> + * intel_enable_engines_stats() - Enable engine busy tracking on all engines
>> + * @dev_priv: i915 device private
>> + *
>> + * Start collecting the engine busyness data for all engines.
>> + *
>> + * Returns 0 on success or a negative error code.
>> + */
>> +int intel_enable_engines_stats(struct drm_i915_private *dev_priv)
>> +{
>> +       struct intel_engine_cs *engine;
>> +       enum intel_engine_id id;
>> +       int ret = 0;
>> +
>> +       if (!i915.enable_execlists)
>> +               return -ENODEV;
>> +
>> +       for_each_engine(engine, dev_priv, id) {
>> +               ret = intel_enable_engine_stats(engine);
>> +               if (WARN_ON_ONCE(ret))
>> +                       break;
> 
> Doesn't the failure here only lead to more failure? The only failure is
> counter saturation, and by not handling that failure you leak the
> earlier refs.

Oopsie, this a bug.

> 
>> +       }
>> +
>> +       return ret;
>> +}
>> +
>> +/**
>> + * intel_disable_engines_stats() - Disable engine busy tracking on all engines
>> + * @dev_priv: i915 device private
>> + *
>> + * Stops collecting the engine busyness data for all engines.
>> + */
>> +void intel_disable_engines_stats(struct drm_i915_private *dev_priv)
>> +{
>> +       struct intel_engine_cs *engine;
>> +       enum intel_engine_id id;
>> +
>> +       for_each_engine(engine, dev_priv, id)
>> +               intel_disable_engine_stats(engine);
>> +}
>> +
>> +/**
>> + * intel_engine_get_busy_time() - Return current accumulated engine busyness
>> + * @engine: engine to report on
>> + *
>> + * Returns accumulated time @engine was busy since engine stats were enabled.
>> + */
>> +ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine)
>> +{
>> +       ktime_t total;
>> +       unsigned long flags;
>> +
>> +       spin_lock_irqsave(&engine->stats.lock, flags);
>> +
>> +       total = engine->stats.total;
>> +
>> +       /*
>> +        * If the engine is executing something at the moment
>> +        * add it to the total.
>> +        */
>> +       if (engine->stats.ref)
>> +               total = ktime_add(total,
>> +                                 ktime_sub(ktime_get(), engine->stats.start));
>> +
>> +       spin_unlock_irqrestore(&engine->stats.lock, flags);
>> +
>> +       return total;
>> +}
>> +
>> +/**
>> + * intel_engines_get_busy_time() - Return current accumulated overall engine busyness
>> + * @dev_priv: i915 device private
>> + *
>> + * Returns accumulated time all engines were busy since engine stats were
>> + * enabled.
>> + */
>> +ktime_t intel_engines_get_busy_time(struct drm_i915_private *dev_priv)
>> +{
>> +       struct intel_engine_cs *engine;
>> +       enum intel_engine_id id;
>> +       ktime_t total = 0;
>> +
>> +       for_each_engine(engine, dev_priv, id)
>> +               total = ktime_add(total, intel_engine_get_busy_time(engine));
>> +
>> +       return total;
>> +}
>> +
>>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>   #include "selftests/mock_engine.c"
>>   #endif
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index b61fb09024c3..00fcbde998fc 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -310,12 +310,14 @@ execlists_context_status_change(struct drm_i915_gem_request *rq,
>>   static inline void
>>   execlists_context_schedule_in(struct drm_i915_gem_request *rq)
>>   {
>> +       intel_engine_context_in(rq->engine);
>>          execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
>>   }
>>   
>>   static inline void
>>   execlists_context_schedule_out(struct drm_i915_gem_request *rq)
>>   {
>> +       intel_engine_context_out(rq->engine);
>>          execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
>>   }
>>   
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index cf095b9386f4..f618c5f98edf 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -463,6 +463,34 @@ struct intel_engine_cs {
>>           * certain bits to encode the command length in the header).
>>           */
>>          u32 (*get_cmd_length_mask)(u32 cmd_header);
>> +
>> +       struct {
>> +              /**
>> +               * @lock: Lock protecting the below fields.
>> +               */
>> +               spinlock_t lock;
>> +               /**
>> +                * @enabled: Reference count indicating number of listeners.
>> +                */
>> +               unsigned int enabled;
>> +               /**
>> +                * @ref: Number of contexts currently scheduled in.
>> +                */
>> +               unsigned int ref;
> 
> active?

I disliked the name as well, yes, active is much better.
> 
>> +               /**
>> +                * @start: Timestamp of the last idle to active transition.
>> +                *
>> +                * Idle is defined as ref == 0, active is ref > 0.
>> +                */
>> +               ktime_t start;
>> +               /**
>> +                * @total: Total time this engine was busy.
>> +                *
>> +                * Accumulated time not counting the most recent block in cases
>> +                * where engine is currently busy (ref > 0).
>> +                */
>> +               ktime_t total;
>> +       } stats;
>>   };
>>   
>>   static inline unsigned int
>> @@ -762,4 +790,57 @@ bool intel_engine_can_store_dword(struct intel_engine_cs *engine);
>>   struct intel_engine_cs *
>>   intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance);
>>   
>> +static inline void intel_engine_context_in(struct intel_engine_cs *engine)
>> +{
>> +       unsigned long flags;
>> +
>> +       if (READ_ONCE(engine->stats.enabled) == 0)
>> +               return;
>> +
>> +       spin_lock_irqsave(&engine->stats.lock, flags);
>> +
>> +       if (engine->stats.enabled > 0) {
>> +               if (engine->stats.ref++ == 0)
>> +                       engine->stats.start = ktime_get();
>> +               GEM_BUG_ON(engine->stats.ref == 0);
>> +       }
>> +
>> +       spin_unlock_irqrestore(&engine->stats.lock, flags);
>> +}
>> +
>> +static inline void intel_engine_context_out(struct intel_engine_cs *engine)
>> +{
>> +       unsigned long flags;
>> +
>> +       if (READ_ONCE(engine->stats.enabled) == 0)
>> +               return;
>> +
>> +       spin_lock_irqsave(&engine->stats.lock, flags);
>> +
>> +       if (engine->stats.enabled > 0) {
>> +               /*
>> +                * After turning on engine stats, context out might be the
>> +                * first event which then needs to be ignored (ref == 0).
>> +                */
>> +               if (engine->stats.ref && --engine->stats.ref == 0) {
>> +                       ktime_t last = ktime_sub(ktime_get(),
>> +                                                engine->stats.start);
> 
> s/last/this/ ? You adding in the time elapsed for the current activity.

Hm don't know. It is kind of not "this" at this point, since the context 
is out, finished. "Last" still sounds better to me.

> 
>> +
>> +                       engine->stats.total = ktime_add(engine->stats.total,
>> +                                                       last);
>> +               }
>> +       }
>> +
>> +       spin_unlock_irqrestore(&engine->stats.lock, flags);
> 
> Only slight annoyance is that we do out before we process in, so if we
> only fill slot0 every time, we end up with a pair of ktime_get()s we
> didn't need.

You mean a continuous stream of requests where all times we would really 
need is start of first, and end of the last?

I don't have immediate ideas on how to implement this. However with that 
approach we would also need to accept accounting the time spent inside 
the tasklet, and tasklet scheduling latency, as GPU busyness. However 
small this time might be.

Either way I'd prefer to leave thinking about this for some future time.

Regards,

Tvrtko