[Intel-gfx] [PATCH v4 1/1] drm/i915/guc: Don't update engine busyness stats too frequently
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Thu Jun 23 08:09:20 UTC 2022
On 23/06/2022 03:31, Alan Previn wrote:
> Using two different types of workoads, it was observed that
> guc_update_engine_gt_clks was being called too frequently and/or
> causing a CPU-to-lmem bandwidth hit over PCIE. Details on
> the workloads and numbers are in the notes below.
>
> Background: At the moment, guc_update_engine_gt_clks can be invoked
> via one of 3 ways. #1 and #2 are infrequent under normal operating
> conditions:
> 1.When a predefined "ping_delay" timer expires so that GuC-
> busyness can sample the GTPM clock counter to ensure it
> doesn't miss a wrap-around of the 32-bits of the HW counter.
> (The ping_delay is calculated based on 1/8th the time taken
> for the counter go from 0x0 to 0xffffffff based on the
> GT frequency. This comes to about once every 28 seconds at a
> GT frequency of 19.2Mhz).
> 2.In preparation for a gt reset.
> 3.In response to __gt_park events (as the gt power management
> puts the gt into a lower power state when there is no work
> being done).
>
> Root-cause: For both the workloads described farther below, it was
> observed that when user space calls IOCTLs that unparks the
> gt momentarily and repeats such calls many times in quick succession,
> it triggers calling guc_update_engine_gt_clks as many times. However,
> the primary purpose of guc_update_engine_gt_clks is to ensure we don't
> miss the wraparound while the counter is ticking. Thus, the solution
> is to ensure we skip that check if gt_park is calling this function
> earlier than necessary.
>
> Solution: Snapshot jiffies when we do actually update the busyness
> stats. Then get the new jiffies every time intel_guc_busyness_park
> is called and bail if we are being called too soon. Use half of the
> ping_delay as a safe threshold.
>
> NOTE1: Workload1: IGTs' gem_create was modified to create a file handle,
> allocate memory with sizes that range from a min of 4K to the max supported
> (in power of two step-sizes). Its maps, modifies and reads back the
> memory. Allocations and modification is repeated until total memory
> allocation reaches the max. Then the file handle is closed. With this
> workload, guc_update_engine_gt_clks was called over 188 thousand times
> in the span of 15 seconds while this test ran three times. With this patch,
> the number of calls reduced to 14.
>
> NOTE2: Workload2: 30 transcode sessions are created in quick succession.
> While these sessions are created, pcm-iio tool was used to measure I/O
> read operation bandwidth consumption sampled at 100 milisecond intervals
> over the course of 20 seconds. The total bandwidth consumed over 20 seconds
> without this patch was measured at average at 311KBps per sample. With this
> patch, the number went down to about 175Kbps which is about a 43% savings.
>
> Signed-off-by: Alan Previn <alan.previn.teres.alexis at intel.com>
> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
Super detailed and clear - thank you!
Acked-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
One question below:
> ---
> drivers/gpu/drm/i915/gt/uc/intel_guc.h | 8 ++++++++
> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 +++++++++++++
> 2 files changed, 21 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 966e69a8b1c1..d0d99f178f2d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -230,6 +230,14 @@ struct intel_guc {
> * @shift: Right shift value for the gpm timestamp
> */
> u32 shift;
> +
> + /**
> + * @last_stat_jiffies: jiffies at last actual stats collection time
> + * We use this timestamp to ensure we don't oversample the
> + * stats because runtime power management events can trigger
> + * stats collection at much higher rates than required.
> + */
> + unsigned long last_stat_jiffies;
> } timestamp;
>
> #ifdef CONFIG_DRM_I915_SELFTEST
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index e62ea35513ea..c9f167b80910 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1314,6 +1314,8 @@ static void __update_guc_busyness_stats(struct intel_guc *guc)
> unsigned long flags;
> ktime_t unused;
>
> + guc->timestamp.last_stat_jiffies = jiffies;
> +
> spin_lock_irqsave(&guc->timestamp.lock, flags);
>
> guc_update_pm_timestamp(guc, &unused);
> @@ -1386,6 +1388,17 @@ void intel_guc_busyness_park(struct intel_gt *gt)
> return;
>
> cancel_delayed_work(&guc->timestamp.work);
> +
> + /*
> + * Before parking, we should sample engine busyness stats if we need to.
> + * We can skip it if we are less than half a ping from the last time we
> + * sampled the busyness stats.
> + */
> + if (guc->timestamp.last_stat_jiffies &&
Is there a case where we enter park with last_stat_jiffies being unset
and so skip updating, and if there is, is that a concern? If it is then
see if it is safe just to not have the zero check. Worst that could
happen is one extra sampling if either unset or overflows, right?
Regards,
Tvrtko
> + !time_after(jiffies, guc->timestamp.last_stat_jiffies +
> + (guc->timestamp.ping_delay / 2)))
> + return;
> +
> __update_guc_busyness_stats(guc);
> }
>
More information about the Intel-gfx
mailing list