[PATCH 0/4] Delay disabling scheduling on a context

Alan Previn alan.previn.teres.alexis at intel.com
Fri Sep 9 04:52:16 UTC 2022


This is a revival of the same series posted by Matthew Brost
back in October 2021 (https://patchwork.freedesktop.org/series/96167/).
Additional real world measured metrics is included this time around
that has proven the effectiveness of this series.

This series adds a delay before disabling scheduling the guc-context
when a context has become idle. The 2nd patch should explain it quite well.

This is the 5th rev of this series (counting from the first
version by Matt). Changes from prior revs:

  v7: - This series was merged and then reverted after invalid
        CI runs unblocked and uncovered a deadlock. Fixed that
        deadlock
      - Added a fix for a race condition between a new incoming
        request and the delay-disable-schedule worker.
      - Added a fix for GT reset where we move all contexts that
        are pending delayed disable-sched directly into the
        pending-disable state after cancelling the worker despite
        having not sent the G2H since this in preparation for a
        reset and a flush of outstanding expected G2H's would be
        dropped anyway. 
  v6: - More cosmetics on comments for threshold and delay knobs.
        (John Harrison).
  v5: - Fixed cosmetic issues with the commit message and comments.
      - Moved "SCHED_DISABLE_DELAY_MS" to the sole location used.
      - Removed the tracing of intel_context_closed.
      - Added the check to intel_guc_submission_is_used in the
        debugfs that gets the current guc-id-threshold to match
        the other debugfs functions added in this series.
      - Changed __guc_get_sched_disable_gucid_threshold_default
        to a macro.
      - Added s-o-b to to the first patch as well.
      - (All above from John Harrison)

  v4: Fix build error.

  v3: Differentiate and appropriately name helper functions for getting
      the 'default threshold of num-guc-ids' vs the 'max threshold of
      num-guc-ids' for bypassing sched-disable and use the correct one
      for the debugfs validation (John Harrison).

  v2: Changed the default of the schedule-disable delay to 34 milisecs
      and added debugfs to control this timing knob. Also added a debugfs
      to control the bypass for not delaying the schedule-disable if
      the we are under pressure with a very low balance of remaining
      guc-ds. (John Harrison).

Alan Previn (2):
  drm/i915/guc: Before a reset, cancel any delayed-disable-scheds
  HAX wip debugging + messages for igt analysis

Daniele Ceraolo Spurio (1):
  drm/i915/guc: Fix race between guc_request_alloc and guc_context_close

Matthew Brost (1):
  drm/i915/guc: Add delay to disable scheduling after pin count goes to
    zero

 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   2 +-
 drivers/gpu/drm/i915/gt/intel_context.h       |   8 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  16 ++
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    |  60 +++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 237 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_selftest.h          |   2 +
 7 files changed, 305 insertions(+), 27 deletions(-)


base-commit: f2c3a05d33693ad51996fa7d12d3b2d4b0f372eb
-- 
2.25.1



More information about the Intel-gfx-trybot mailing list