[PATCH v3 00/15] CCS static load balance
Daniel Vetter
daniel.vetter at ffwll.ch
Tue Aug 27 17:31:21 UTC 2024
On Fri, Aug 23, 2024 at 03:08:40PM +0200, Andi Shyti wrote:
> Hi,
>
> This patch series introduces static load balancing for GPUs with
> multiple compute engines. It's a lengthy series, and some
> challenging aspects still need to be resolved.
Do we have an actual user for this, where just reloading the entire driver
(or well-rebinding, if you only want to change the value for a specific
device) with a new module option isn't enough?
There's some really gnarly locking and lifetime fun in there, and it needs
a corresponding justification. Which needs to be enormous for this case,
meaning actual customers willing to shout on dri-devel that they really,
absolutely need this, or their machines will go up in flames.
Otherwise this is a nack from me.
Thanks, Sima
>
> I have tried to split the work as much as possible to facilitate
> the review process.
>
> To summarize, in patches 1 to 14, no functional changes occur
> except for the addition of the num_cslices interface. The
> significant changes happen in patch 15, which is the core part of
> the CCS mode setting, utilizing the groundwork laid in the
> earlier patches.
>
> In this updated approach, the focus is now on managing the UABI
> engine list, which controls the engines exposed to userspace.
> Instead of manipulating phuscal engines and their memory, we now
> handle engine exposure through this list.
>
> I would greatly appreciate further input from all reviewers who
> have already assisted with the previous work.
>
> IGT tests have also been developed, but I haven't sent them yet.
>
> Thank you Chris for the offline reviews.
>
> Thanks,
> Andi
>
> Changelog:
> ==========
> PATCHv2 -> PATCHv3
> ------------------
> - Fix a NULL pointer dereference during module unload.
> In i915_gem_driver_remove() I was accessing the gt after the
> gt was removed. Use the dev_priv, instead (obviously!).
> - Fix a lockdep issue: Some of the uabi_engines_mutex unlocks
> were not correctly placed in the exit paths.
> - Fix a checkpatch error for spaces after and before parenthesis
> in the for_each_enabled_engine() definition.
>
> PATCHv1 -> PATCHv2
> ------------------
> - Use uabi_mutex to protect the uabi_engines, not the engine
> itself. Rename it to uabi_engines_mutex.
> - Use kobject_add/kobject_del for adding and removing
> interfaces, this way we don't need to destroy and recreate the
> engines, anymore. Refactor intel_engine_add_single_sysfs() to
> reflect this scenario.
> - After adding engines to the rb_tree check that they have been
> added correctly.
> - Fix rb_find_add() compare function to take into accoung also
> the class, not just the instance.
>
> RFCv2 -> PATCHv1
> ----------------
> - Removed gt->ccs.mutex
> - Rename m -> width, ccs_id -> engine in
> intel_gt_apply_ccs_mode().
> - In the CCS register value calculation
> (intel_gt_apply_ccs_mode()) the engine (ccs_id) needs to move
> along the ccs_mask (set by the user) instead of the
> cslice_mask.
> - Add GEM_BUG_ON after calculating the new ccs_mask
> (update_ccs_mask()) to make sure all angines have been
> evaluated (i.e. ccs_mask must be '0' at the end of the
> algorithm).
> - move wakeref lock before evaluating intel_gt_pm_is_awake() and
> fix exit path accordingly.
> - Use a more compact form in intel_gt_sysfs_ccs_init() and
> add_uabi_ccs_engines() when evaluating sysfs_create_file(): no
> need to store the return value to the err variable which is
> unused. Get rid of err.
> - Print a warnging instead of a debug message if we fail to
> create the sysfs files.
> - If engine files creation fails in
> intel_engine_add_single_sysfs(), print a warning, not an
> error.
> - Rename gt->ccs.ccs_mask to gt->ccs.id_mask and add a comment
> to explain its purpose.
> - During uabi engine creation, in
> intel_engines_driver_register(), the uabi_ccs_instance is
> redundant because the ccs_instances is already tracked in
> engine->uabi_instance.
> - Mark add_uabi_ccs_engines() and remove_uabi_ccs_engines() as
> __maybe_unused not to break bisectability. They wouldn't
> compile in their own commit. They will be used in the next
> patch and the __maybe_unused is removed.
> - Update engine's workaround every time a new mode is set in
> update_ccs_mask().
> - Mark engines as valid or invalid using their status as
> rb_node. Invalid engines are marked as invalid using
> RB_CLEAR_NODE(). Execbufs will check for their validity when
> selecting the engine to be combined to a context.
> - Create for_each_enabled_engine() which skips the non valid
> engines and use it in selftests.
>
> RFCv1 -> RFCv2
> --------------
> Compared to the first version I've taken a completely different
> approach to adding and removing engines. in v1 physical engines
> were directly added and removed, along with the memory allocated
> to them, each time the user changed the CCS mode (from the
> previous cover letter).
>
> Andi Shyti (15):
> drm/i915/gt: Avoid using masked workaround for CCS_MODE setting
> drm/i915/gt: Move the CCS mode variable to a global position
> drm/i915/gt: Allow the creation of multi-mode CCS masks
> drm/i915/gt: Refactor uabi engine class/instance list creation
> drm/i915/gem: Mark and verify UABI engine validity
> drm/i915/gt: Introduce for_each_enabled_engine() and apply it in
> selftests
> drm/i915/gt: Manage CCS engine creation within UABI exposure
> drm/i915/gt: Remove cslices mask value from the CCS structure
> drm/i915/gt: Expose the number of total CCS slices
> drm/i915/gt: Store engine-related sysfs kobjects
> drm/i915/gt: Store active CCS mask
> drm/i915: Protect access to the UABI engines list with a mutex
> drm/i915/gt: Isolate single sysfs engine file creation
> drm/i915/gt: Implement creation and removal routines for CCS engines
> drm/i915/gt: Allow the user to change the CCS mode through sysfs
>
> drivers/gpu/drm/i915/gem/i915_gem_context.c | 3 +
> .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 28 +-
> drivers/gpu/drm/i915/gt/intel_engine_cs.c | 23 --
> drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 +
> drivers/gpu/drm/i915/gt/intel_engine_user.c | 62 ++-
> drivers/gpu/drm/i915/gt/intel_gt.c | 3 +
> drivers/gpu/drm/i915/gt/intel_gt.h | 12 +
> drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 353 +++++++++++++++++-
> drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 5 +-
> drivers/gpu/drm/i915/gt/intel_gt_sysfs.c | 2 +
> drivers/gpu/drm/i915/gt/intel_gt_types.h | 19 +-
> drivers/gpu/drm/i915/gt/intel_workarounds.c | 8 +-
> drivers/gpu/drm/i915/gt/selftest_context.c | 6 +-
> drivers/gpu/drm/i915/gt/selftest_engine_cs.c | 4 +-
> .../drm/i915/gt/selftest_engine_heartbeat.c | 6 +-
> drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 6 +-
> drivers/gpu/drm/i915/gt/selftest_execlists.c | 52 +--
> drivers/gpu/drm/i915/gt/selftest_gt_pm.c | 2 +-
> drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 22 +-
> drivers/gpu/drm/i915/gt/selftest_lrc.c | 18 +-
> drivers/gpu/drm/i915/gt/selftest_mocs.c | 6 +-
> drivers/gpu/drm/i915/gt/selftest_rc6.c | 4 +-
> drivers/gpu/drm/i915/gt/selftest_reset.c | 8 +-
> .../drm/i915/gt/selftest_ring_submission.c | 2 +-
> drivers/gpu/drm/i915/gt/selftest_rps.c | 14 +-
> drivers/gpu/drm/i915/gt/selftest_timeline.c | 14 +-
> drivers/gpu/drm/i915/gt/selftest_tlb.c | 2 +-
> .../gpu/drm/i915/gt/selftest_workarounds.c | 14 +-
> drivers/gpu/drm/i915/gt/sysfs_engines.c | 79 ++--
> drivers/gpu/drm/i915/gt/sysfs_engines.h | 2 +
> drivers/gpu/drm/i915/i915_cmd_parser.c | 2 +
> drivers/gpu/drm/i915/i915_debugfs.c | 4 +
> drivers/gpu/drm/i915/i915_drv.h | 5 +
> drivers/gpu/drm/i915/i915_gem.c | 4 +
> drivers/gpu/drm/i915/i915_perf.c | 8 +-
> drivers/gpu/drm/i915/i915_pmu.c | 11 +-
> drivers/gpu/drm/i915/i915_query.c | 21 +-
> 37 files changed, 643 insertions(+), 193 deletions(-)
>
> --
> 2.45.2
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the Intel-gfx
mailing list