[Intel-gfx] [PATCH 8/8] drm/i915: Gate engine stats collection with a static key

Tue Sep 26 12:42:51 UTC 2017

On 25/09/2017 18:56, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2017-09-25 16:15:43)
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> This reduces the cost of the software engine busyness tracking
>> to a single no-op instruction when there are no listeners.
>>
>> v2: Rebase and some comments.
>> v3: Rebase.
>> v4: Checkpatch fixes.
>> v5: Rebase.
>> v6: Use system_long_wq to avoid being blocked by struct_mutex
>>      users.
>> v7: Fix bad conflict resolution from last rebase. (Dmitry Rogozhkin)
>> v8: Rebase.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> Bah, still unhappy about the global. I know in all likelihood it doesn't
> matter, but it still bugs me.

If that's the biggest problem with this patch then that's good! :)

But I think some benchmarking is also in order with data added to the 
commit msg.

>> ---
>>   drivers/gpu/drm/i915/i915_pmu.c         |  54 +++++++++++++++--
>>   drivers/gpu/drm/i915/intel_engine_cs.c  |  17 ++++++
>>   drivers/gpu/drm/i915/intel_ringbuffer.h | 101 ++++++++++++++++++++------------
>>   3 files changed, 130 insertions(+), 42 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>> index 228aa50ce709..e768f33ebb3d 100644
>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>> @@ -501,11 +501,17 @@ static void i915_pmu_enable(struct perf_event *event)
>>                  GEM_BUG_ON(sample >= I915_PMU_SAMPLE_BITS);
>>                  GEM_BUG_ON(engine->pmu.enable_count[sample] == ~0);
>>                  if (engine->pmu.enable_count[sample]++ == 0) {
>> +                       /*
>> +                        * Enable engine busy stats tracking if needed or
>> +                        * alternatively cancel the scheduled disabling of the
>> +                        * same.
>> +                        */
>>                          if (engine_needs_busy_stats(engine) &&
>>                              !engine->pmu.busy_stats) {
>> -                               engine->pmu.busy_stats =
>> -                                       intel_enable_engine_stats(engine) == 0;
>> -                               WARN_ON_ONCE(!engine->pmu.busy_stats);
>> +                               engine->pmu.busy_stats = true;
>> +                               if (!cancel_delayed_work(&engine->pmu.disable_busy_stats))
>> +                                       queue_work(system_long_wq,
>> +                                                  &engine->pmu.enable_busy_stats);
>>                          }
>>                  }
>>          }
>> @@ -548,7 +554,15 @@ static void i915_pmu_disable(struct perf_event *event)
>>                          if (!engine_needs_busy_stats(engine) &&
>>                              engine->pmu.busy_stats) {
>>                                  engine->pmu.busy_stats = false;
>> -                               intel_disable_engine_stats(engine);
>> +                               /*
>> +                                * We request a delayed disable to handle the
>> +                                * rapid on/off cycles on events which can
>> +                                * happen when tools like perf stat start in a
>> +                                * nicer way.
>> +                                */
>> +                               queue_delayed_work(system_long_wq,
>> +                                                  &engine->pmu.disable_busy_stats,
>> +                                                  round_jiffies_up_relative(HZ));
> 
> What's preventing the perverse system from executing the enable after
> the disable? Say the enable was scheduled on another cpu that was
> stalled? >
> Something like a if (cancel_work(enable)) return ?

It would be very perverse indeed considering the delayed disable, but 
well spotted!

It looks like your solution would work. I could also go with a dedicated 
ordered wq. At the moment I don't have a preference. Will give it a think.

>>                          }
>>                  }
>>          }
>> @@ -739,9 +753,27 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
>>          return 0;
>>   }
>>   
>> +static void __enable_busy_stats(struct work_struct *work)
>> +{
>> +       struct intel_engine_cs *engine =
>> +               container_of(work, typeof(*engine), pmu.enable_busy_stats);
>> +
>> +       WARN_ON_ONCE(intel_enable_engine_stats(engine));
>> +}
>> +
>> +static void __disable_busy_stats(struct work_struct *work)
>> +{
>> +       struct intel_engine_cs *engine =
>> +              container_of(work, typeof(*engine), pmu.disable_busy_stats.work);
>> +
>> +       intel_disable_engine_stats(engine);
>> +}
>> +
>>   void i915_pmu_register(struct drm_i915_private *i915)
>>   {
>>          int ret;
>> +       struct intel_engine_cs *engine;
>> +       enum intel_engine_id id;
> 
> What a nice Christmas tree you are growing :-p

Will make it prettier.

Regards,

Tvrtko