[Intel-gfx] [PATCH] drm/i915/pmu: Connect engine busyness stats from GuC to pmu
Umesh Nerlige Ramappa
umesh.nerlige.ramappa at intel.com
Thu Oct 7 15:42:33 UTC 2021
On Thu, Oct 07, 2021 at 09:17:34AM +0100, Tvrtko Ursulin wrote:
>
>On 06/10/2021 21:45, Umesh Nerlige Ramappa wrote:
>>On Wed, Oct 06, 2021 at 10:11:58AM +0100, Tvrtko Ursulin wrote:
>
>[snip]
>
>>>>@@ -762,12 +764,25 @@ submission_disabled(struct intel_guc *guc)
>>>> static void disable_submission(struct intel_guc *guc)
>>>> {
>>>> struct i915_sched_engine * const sched_engine = guc->sched_engine;
>>>>+ struct intel_gt *gt = guc_to_gt(guc);
>>>>+ struct intel_engine_cs *engine;
>>>>+ enum intel_engine_id id;
>>>>+ unsigned long flags;
>>>> if (__tasklet_is_enabled(&sched_engine->tasklet)) {
>>>> GEM_BUG_ON(!guc->ct.enabled);
>>>> __tasklet_disable_sync_once(&sched_engine->tasklet);
>>>> sched_engine->tasklet.callback = NULL;
>>>> }
>>>>+
>>>>+ cancel_delayed_work(&guc->timestamp.work);
>>>
>>>I am not sure when disable_submission gets called so a question -
>>>could it be important to call cancel_delayed_work_sync here to
>>>ensure if the worker was running it had exited before proceeding?
>>
>>disable_submission is called in the reset_prepare path for uc
>>resets. I see this happening only with busy-hang test which does a
>>global gt reset. The counterpart for this is the
>>guc_init_engine_stats which is called post reset in the path to
>>initialize GuC.
>>
>>I tried cancel_delayed_work_sync both here and in park. Seems to
>>work fine, so will change the calls to _sync versions.
>
>From park is not allowed to sleep so can't do sync from there. It
>might have been my question which put you on a wrong path, sorry. Now
>I think question remains what happens if the ping worker happens to be
>sampling GuC state as GuC is being reset? Do you need some sort of a
>lock to protect that, or make sure worker skips if reset in progress?
>
If ping ran after the actual gt reset, we should be okay. If it ran
after we reset prev_total and before gt reset, then we have bad
busyness. At the same time, skipping ping risks timestamp overflow. I am
thinking skip ping, but update all stats in the reset_prepare path.
reset_prepare is running with pm runtime.
On a different note, during reset, we need to store now-start into the
total_gt_clks also because we may lose that information in the next pmu
query or ping (post reset). Maybe I will store active_clks instead of
running in the stats to do that.
Thanks,
Umesh
More information about the Intel-gfx
mailing list