[Intel-gfx] [Intel-gfx 1/1] drm/i915/guc: Don't update engine busyness stats too frequently

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Tue Jun 14 07:07:04 UTC 2022


On 14/06/2022 02:10, Umesh Nerlige Ramappa wrote:
> On Sat, Jun 11, 2022 at 10:27:11AM -0700, Alan Previn wrote:
>> Using igt's gem-create and with additional patches to track object
>> creation time, it was measured that guc_update_engine_gt_clks was
>> getting called over 188 thousand times in the span of 15 seconds
>> (running the test three times).
>>
>> Get a jiffies sample on every trigger and ensure we skip sampling
>> if we are being called too soon. Use half of the ping_delay as a
>> safe threshold.
>>
>> NOTE: with this change, the number of calls went down to just 14
>> over the same span of time (matching the original intent of running
>> about once every 24 seconds, at 19.2Mhz GT freq, per engine).
>>
>> Signed-off-by: Alan Previn <alan.previn.teres.alexis at intel.com>
>> ---
>> drivers/gpu/drm/i915/gt/intel_engine_types.h      | 10 ++++++++++
>> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  9 +++++++++
>> 2 files changed, 19 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
>> b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>> index 2286f96f5f87..63f4ecdf1606 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>> @@ -323,6 +323,16 @@ struct intel_engine_guc_stats {
>>      * @start_gt_clk: GT clock time of last idle to active transition.
>>      */
>>     u64 start_gt_clk;
>> +
>> +    /**
>> +     * @last_jiffies: Jiffies at last actual stats collection time
>> +     *
>> +     * We use this timestamp to ensure we don't oversample the
>> +     * stats because runtime power management events can trigger
>> +     * stats collection at much higher rates than required.
>> +     */
>> +    u64 last_jiffies;
>> +
>> };
>>
>> struct intel_engine_cs {
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> index 5a1dfacf24ea..8f8bf6e40ccb 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> @@ -1167,6 +1167,15 @@ static void guc_update_engine_gt_clks(struct 
>> intel_engine_cs *engine)
> 
> A user query will end up in guc_engine_busyness which will call 
> guc_update_engine_gt_clks. Adding this logic here will affect accuracy.
> The other place where guc_update_engine_gt_clks is called is in the ping 
> worker, but that worker runs at 1/8th the wrap around time for the gt 
> clocks (32 bit). The last I checked the wrap around was at 22 seconds.
> 
> That leaves only the gt_park path. fwiu, this path runs too frequently 
> and here we are updating the busyness stats. That is causing the 
> enormous PCI traffic (lmem accesses). Only this path needs to be fixed, 
> as in just use the same logic in the intel_guc_busyness_park() to decide 
> whether to call __update_guc_busyness_stats or not.

Not updating the driver state in park will not negatively impact 
accuracy in some scenarios? That needs to balanced against the questions 
whether or not there are real world scenarios impacted by the update 
cost or it is just for IGT.

Regards,

Tvrtko

> 
> Of course, if you are running the user query too frequently, then IMO we 
> should not fix that in i915.
> If you haven't already, please make sure the igt at perf_pmu@ tests are all 
> passing with any of these changes. There's also a selftest - 
> live_engine_busy_stats that you need to make sure passes.
> 
>>     struct intel_engine_guc_stats *stats = &engine->stats.guc;
>>     struct intel_guc *guc = &engine->gt->uc.guc;
>>     u32 last_switch, ctx_id, total;
>> +    u64 newjiffs;
>> +
>> +    /* Don't worry about jiffies wrap-around, a rare additional 
>> sample won't have any impact */
>> +    newjiffs = get_jiffies_64();
>> +    if (stats->last_jiffies && (newjiffs - stats->last_jiffies <
>> +       (guc->timestamp.ping_delay << 1)))
> 
> You want to right shift by 1 for half the ping delay here.
> 
> Thanks,
> Umesh
> 
>> +        return;
>> +
>> +    stats->last_jiffies = newjiffs;
>>
>>     lockdep_assert_held(&guc->timestamp.lock);
>>
>> -- 
>> 2.25.1
>>


More information about the Intel-gfx mailing list