[Intel-gfx] [PATCH 26/28] drm/i915: Fair low-latency scheduling
Chris Wilson
chris at chris-wilson.co.uk
Tue Jun 16 10:54:58 UTC 2020
Quoting Thomas Hellström (Intel) (2020-06-16 10:07:28)
> Hi, Chris,
>
> Some comments and questions:
>
> On 6/8/20 12:21 AM, Chris Wilson wrote:
> > The first "scheduler" was a topographical sorting of requests into
> > priority order. The execution order was deterministic, the earliest
> > submitted, highest priority request would be executed first. Priority
> > inherited ensured that inversions were kept at bay, and allowed us to
> > dynamically boost priorities (e.g. for interactive pageflips).
> >
> > The minimalistic timeslicing scheme was an attempt to introduce fairness
> > between long running requests, by evicting the active request at the end
> > of a timeslice and moving it to the back of its priority queue (while
> > ensuring that dependencies were kept in order). For short running
> > requests from many clients of equal priority, the scheme is still very
> > much FIFO submission ordering, and as unfair as before.
> >
> > To impose fairness, we need an external metric that ensures that clients
> > are interpersed, we don't execute one long chain from client A before
> > executing any of client B. This could be imposed by the clients by using
> > a fences based on an external clock, that is they only submit work for a
> > "frame" at frame-interval, instead of submitting as much work as they
> > are able to. The standard SwapBuffers approach is akin to double
> > bufferring, where as one frame is being executed, the next is being
> > submitted, such that there is always a maximum of two frames per client
> > in the pipeline. Even this scheme exhibits unfairness under load as a
> > single client will execute two frames back to back before the next, and
> > with enough clients, deadlines will be missed.
> >
> > The idea introduced by BFS/MuQSS is that fairness is introduced by
> > metering with an external clock. Every request, when it becomes ready to
> > execute is assigned a virtual deadline, and execution order is then
> > determined by earliest deadline. Priority is used as a hint, rather than
> > strict ordering, where high priority requests have earlier deadlines,
> > but not necessarily earlier than outstanding work. Thus work is executed
> > in order of 'readiness', with timeslicing to demote long running work.
> >
> > The Achille's heel of this scheduler is its strong preference for
> > low-latency and favouring of new queues. Whereas it was easy to dominate
> > the old scheduler by flooding it with many requests over a short period
> > of time, the new scheduler can be dominated by a 'synchronous' client
> > that waits for each of its requests to complete before submitting the
> > next. As such a client has no history, it is always considered
> > ready-to-run and receives an earlier deadline than the long running
> > requests.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 12 +-
> > .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 1 +
> > drivers/gpu/drm/i915/gt/intel_engine_pm.c | 4 +-
> > drivers/gpu/drm/i915/gt/intel_engine_types.h | 24 --
> > drivers/gpu/drm/i915/gt/intel_lrc.c | 328 +++++++-----------
> > drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 5 +-
> > drivers/gpu/drm/i915/gt/selftest_lrc.c | 43 ++-
> > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 6 +-
> > drivers/gpu/drm/i915/i915_priolist_types.h | 7 +-
> > drivers/gpu/drm/i915/i915_request.h | 4 +-
> > drivers/gpu/drm/i915/i915_scheduler.c | 322 ++++++++++++-----
> > drivers/gpu/drm/i915/i915_scheduler.h | 22 +-
> > drivers/gpu/drm/i915/i915_scheduler_types.h | 17 +
> > .../drm/i915/selftests/i915_mock_selftests.h | 1 +
> > drivers/gpu/drm/i915/selftests/i915_request.c | 1 +
> > .../gpu/drm/i915/selftests/i915_scheduler.c | 49 +++
> > 16 files changed, 484 insertions(+), 362 deletions(-)
> > create mode 100644 drivers/gpu/drm/i915/selftests/i915_scheduler.c
>
> Do we have timings to back this change up? Would it make sense to have a
> configurable scheduler choice?
gem_wsim workloads with different load balancers, varying the number of
clients, % variation from previous patch.
+mB--------------------------------------------------------------------+
| a |
| cda |
| c.a |
| ..aa |
| ..---. |
| -.--+-. |
| .c.-.-+++. b |
| b bb.d-c-+--+++.aab aa b b |
|b b b b b. b ..---+++-+++++....a. b. b b b b b b|
| A| |
| |___AM____| |
| |A__| |
| |MA_| |
+----------------------------------------------------------------------+
Clients N Min Max Median Avg Stddev
1 63 -8.2 5.4 -0.045 -0.02375 0.094722134
2 63 -15.96 19.28 -0.64 -1.05 2.2428076
4 63 -5.11 2.95 -1.15 -1.0683333 0.72382651
8 63 -5.63 1.85 -0.905 -0.87122449 0.73390971
The wildest swings there do appear to be a result of interrupt latency,
with the -1% impact from execution order and more context switching.
-Chris
More information about the Intel-gfx
mailing list