[Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero

Thu Jul 28 20:19:53 UTC 2022

On 6/27/2022 22:51, Alan Previn wrote:
> From: Matthew Brost <matthew.brost at intel.com>
>
> Add a delay, configurable via debugs (default 100ms), to disable
debugs -> debugfs

Default is now 34ms?

> scheduling of a context after the pin count goes to zero. Disable
> scheduling is somewhat costly operation so the idea is a delay allows
costly operation as it requires synchronising with the GuC. So the idea

> the resubmit something before doing this operation. This delay is only
the user to resubmit

> done if the context isn't close and less than 3/4 of the guc_ids are in
close -> closed

less than a given threshold (default is 3/4) of the guc_ids

> use.
>
> As temporary WA disable this feature for the selftests. Selftests are
> very timing sensitive and any change in timing can cause failure. A
> follow up patch will fixup the selftests to understand this delay.
>
> Alan Previn: Matt Brost first introduced this series back in Oct 2021.
> However no real world workload with measured performance impact was
> available to prove the intended results. Today, this series is being
> republished in response to a real world workload that benefited greatly
> from it along with measured performance improvement.
>
> Workload description: 36 containers were created on a DG2 device where
> each container was performing a combination of 720p 3d game rendering
> and 30fps video encoding. The workload density was configured in way
> that guaranteed each container to ALWAYS be able to render and
> encode no less than 30fps with a predefined maximum render + encode
> latency time. That means that the totality of all 36 containers and its
> workloads were not saturating the utilized hw engines to its max
> (in order to maintain just enough headrooom to meet the minimum fps and
> latencies of incoming container submissions).
>
> Problem statement: It was observed that the CPU utilization of the CPU
> core that was pinned to i915 soft IRQ work was experiencing severe load.
> Using tracelogs and an instrumentation patch to count specific i915 IRQ
> events, it was confirmed that the majority of the CPU cycles were caused
> by the gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
> majority of the cycles was determined to be processing a specific G2H IRQ
> which was INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. This IRQ is send by
send -> sent

> the GuC in response to the i915 KMD sending the H2G requests
> INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET to the GuC. That request is sent
> when the context is idle to unpin the context from any GuC access. The
> high CPU utilization % symptom was limiting the density scaling.
>
> Root Cause Analysis: Because the incoming execution buffers were spread
> across 36 different containers (each with multiple contexts) but the
> system in totality was NOT saturated to the max, it was assumed that each
> context was constantly idling between submissions. This was causing thrashing
> of unpinning a context from GuC at one moment, followed by repinning it
> due to incoming workload the very next moment. Both of these event-pairs
> were being triggered across multiple contexts per container, across all
> containers at the rate of > 30 times per sec per context.
>
> Metrics: When running this workload without this patch, we measured an average
> of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10 seconds or
> ~10 million times over ~25+ mins. With this patch, the count reduced to ~480
> every 10 seconds or about ~28K over ~10 mins. The improvement observed is
> ~99% for the average counts per 10 seconds.
>
> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> Acked-by: Alan Previn <alan.previn.teres.alexis at intel.com>
Needs your s-o-b as you are posting the patch.

The code below looks to be the old rev of the patch? This still needs 
updating with the cleanup work?

John.