[Intel-gfx] [PATCH] i915/guc: Run busyness worker only if gt is awake
Daniele Ceraolo Spurio
daniele.ceraolospurio at intel.com
Thu Sep 14 17:23:37 UTC 2023
On 9/11/2023 5:52 PM, Umesh Nerlige Ramappa wrote:
> The worker is canceled in the __gt_park path, but we still see it
> running sometimes during suspend.
>
> Only update stats if gt is awake. If not, intel_guc_busyness_park would
> have already updated the stats. Note that we do not requeue the worker
> if gt is not awake since intel_guc_busyness_unpark would do that at some
> point.
>
> If the gt was parked longer than time taken for GT timestamp to roll
> over, we ignore those rollovers since we don't care about tracking the
> exact GT time. We only care about roll overs when the gt is active and
> running workloads.
>
> v2 (Daniele)
> - Edit commit message and code comment
> - Use runtime pm in the worker
> - Put runtime pm after enabling the worker
> - Use Link tag and add Fixes tag
>
> Link: https://gitlab.freedesktop.org/drm/intel/-/issues/7077
> Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu")
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
> ---
> .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 26 ++++++++++++++++---
> 1 file changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index e250bedf90fb..d37b255559a0 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1461,6 +1461,24 @@ static void guc_timestamp_ping(struct work_struct *wrk)
> unsigned long index;
> int srcu, ret;
>
> + /*
> + * The worker is canceled in the __gt_park path, but we still see it
> + * running sometimes during suspend.
This sounds very vague and more like there is a bug in __gt_park. We
actually do know that the issue on the park side is that the worker
re-queues itself if it's running while we cancel it; that's not really a
problem as long as we don't wake the device after it's gone to sleep
(which is what your change below does), so this sentence should be
reworded or dropped.
> + *
> + * Only update stats if gt is awake. If not, intel_guc_busyness_park
> + * would have already updated the stats. Note that we do not requeue the
> + * worker in this case since intel_guc_busyness_unpark would do that at
> + * some point.
> + *
> + * If the gt was parked longer than time taken for GT timestamp to roll
> + * over, we ignore those rollovers since we don't care about tracking
> + * the exact GT time. We only care about roll overs when the gt is
> + * active and running workloads.
> + */
> + wakeref = intel_runtime_pm_get_if_active(>->i915->runtime_pm);
The patch title and the comment refer to GT being awake, but here you're
taking a global rpm ref and not a gt_pm ref. I understand that taking a
gt_pm ref in here can be complicated because of the canceling of the
worker in the gt_park flow and that taking an rpm ref does work for what
we need, but the title/comments need to reflect and explain that.
Daniele
> + if (!wakeref)
> + return;
> +
> /*
> * Synchronize with gt reset to make sure the worker does not
> * corrupt the engine/guc stats. NB: can't actually block waiting
> @@ -1469,10 +1487,9 @@ static void guc_timestamp_ping(struct work_struct *wrk)
> */
> ret = intel_gt_reset_trylock(gt, &srcu);
> if (ret)
> - return;
> + goto err_trylock;
>
> - with_intel_runtime_pm(>->i915->runtime_pm, wakeref)
> - __update_guc_busyness_stats(guc);
> + __update_guc_busyness_stats(guc);
>
> /* adjust context stats for overflow */
> xa_for_each(&guc->context_lookup, index, ce)
> @@ -1481,6 +1498,9 @@ static void guc_timestamp_ping(struct work_struct *wrk)
> intel_gt_reset_unlock(gt, srcu);
>
> guc_enable_busyness_worker(guc);
> +
> +err_trylock:
> + intel_runtime_pm_put(>->i915->runtime_pm, wakeref);
> }
>
> static int guc_action_enable_usage_stats(struct intel_guc *guc)
More information about the Intel-gfx
mailing list