[PATCH v6 2/4] drm/xe/sriov: Shifting GGTT area post migration

Thu Apr 3 17:26:36 UTC 2025

On 01.04.2025 19:24, Michal Wajdeczko wrote:
> please use "drm/xe/vf:" in the subject as this patch is still more VF
> oriented, then general SRIOV
ok.
> On 31.03.2025 15:21, Tomasz Lis wrote:
>> We have only one GGTT for all IOV functions, with each VF having assigned
>> a range of addresses for its use. After migration, a VF can receive a
>> different range of addresses than it had initially.
>>
>> This implements shifting GGTT addresses within drm_mm nodes, so that
>> VMAs stay valid after migration. This will make the driver use new
>> addresses when accessing GGTT from the moment the shifting ends.
>>
>> By taking the ggtt->lock for the period of VMA fixups, this change
>> also adds constraint on that mutex. Any locks used during the recovery
>> cannot ever wait for hardware response - because after migration,
>> the hardware will not do anything until fixups are finished.
>>
>> v2: Moved some functs to xe_ggtt.c; moved shift computation to just
>>    after querying; improved documentation; switched some warns to asserts;
>>    skipping fixups when GGTT shift eq 0; iterating through tiles (Michal)
>> v3: Updated kerneldocs, removed unused funct, properly allocate
>>    balloning nodes if non existent
>> v4: Re-used ballooning functions from VF init, used bool in place of
>>    standard error codes
>> v5: Renamed one function
>>
>> Signed-off-by: Tomasz Lis<tomasz.lis at intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_ggtt.c              | 33 +++++++++
>>   drivers/gpu/drm/xe/xe_ggtt.h              |  1 +
>>   drivers/gpu/drm/xe/xe_gt_sriov_vf.c       | 83 +++++++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_gt_sriov_vf.h       |  1 +
>>   drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h |  2 +
>>   drivers/gpu/drm/xe/xe_sriov_vf.c          | 17 +++++
>>   6 files changed, 137 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
>> index 2d7456e37ef4..b13c4a12393e 100644
>> --- a/drivers/gpu/drm/xe/xe_ggtt.c
>> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
>> @@ -482,6 +482,39 @@ void xe_ggtt_node_remove_balloon(struct xe_ggtt_node *node)
>>   	drm_mm_remove_node(&node->base);
>>   }
>>   
>> +/**
>> + * xe_ggtt_shift_mm_nodes - Shift GGTT nodes to adjust for a change in usable address range.
> drop "mm" from the function name as it is implementation detail
>
> 	xe_ggtt_shift_nodes()

The fact that nodes are used for Memory Management is an implementation 
detail?

I will remove `mm` from the name, because no other function uses that 
shortcut. But let's stick

to real arguments, ie. that no other function dealing with nodes has the 
_mm_ in name.

>> + * @ggtt: the &xe_ggtt struct instance
>> + * @shift: change to the location of area provisioned for current VF
> this function is quite generic and while it is used by the VF only, the
> parameter doesn't have to be described as such
I just changed the subject line because (from comment above) the patch 
is VF oriented.
> you may add here, in the 'longer description' part of the kernel-doc,
> after explaining how it works, that function is mostly used by the VF,
> and why we believe it will succeed

Not "mostly" but "only". There is no other scenario where we could 
anticipate this function to be used.

While universalization is generally a good direction, pretending that 
this code will ever be used for anything else than VF is not a direction 
with any chance for future benefits.

For the comment about errors - will add.

>> + */
>> +void xe_ggtt_shift_mm_nodes(struct xe_ggtt *ggtt, s64 shift)
>> +{
>> +	struct drm_mm_node *node, *tmpn;
>> +	LIST_HEAD(temp_list_head);
>> +	int err;
>> +
>> +	lockdep_assert_held(&ggtt->lock);
>> +
>> +	/*
>> +	 * Move nodes, from range previously assigned to this VF, to a temp list.
> nit: no need for multi-line comment style
>
> also, maybe it could be moved to the 'longer description' part of the
> function kernel-doc, after sanitize it from the 'VF' specific wording
>
>> +	 */
>> +	drm_mm_for_each_node_safe(node, tmpn, &ggtt->mm) {
>> +		drm_mm_remove_node(node);
>> +		list_add(&node->node_list, &temp_list_head);
>> +	}
>> +
>> +	/*
>> +	 * Now the GGTT VM contains no nodes. We can re-add all VF nodes with
>> +	 * shifted offsets.
>> +	 */
> also consider to move this comment to function kernel-doc

Not sure why we want a detail on implementation in the function description.

Also not sure why we want detailed description block in this function, 
when we already have that in the caller.

But I don't mind, will move.

>> +	list_for_each_entry_safe(node, tmpn, &temp_list_head, node_list) {
>> +		list_del(&node->node_list);
>> +		node->start += shift;
>> +		err = drm_mm_reserve_node(&ggtt->mm, node);
>> +		xe_tile_assert(ggtt->tile, !err);
> while we believe it should be always possible to 'shift' all nodes, as
> we just released our balloons, I'm not sure that this assert here alone
> is sufficient
>
> maybe before starting any movements, check that 'shift' is valid, and
> add asserts for each node that shifted location is within GGTT space?

Before starting? How exactly do we check the shift before we have any 
nodes? Without nodes, we have nothing to compare to.

I assume that what you're going for, is just to add more asserts on each 
node. Will do.

>> +	}
>> +}
>> +
>>   /**
>>    * xe_ggtt_node_insert_locked - Locked version to insert a &xe_ggtt_node into the GGTT
>>    * @node: the &xe_ggtt_node to be inserted
>> diff --git a/drivers/gpu/drm/xe/xe_ggtt.h b/drivers/gpu/drm/xe/xe_ggtt.h
>> index 27e7d67de004..a07194cd3724 100644
>> --- a/drivers/gpu/drm/xe/xe_ggtt.h
>> +++ b/drivers/gpu/drm/xe/xe_ggtt.h
>> @@ -18,6 +18,7 @@ void xe_ggtt_node_fini(struct xe_ggtt_node *node);
>>   int xe_ggtt_node_insert_balloon(struct xe_ggtt_node *node,
>>   				u64 start, u64 size);
>>   void xe_ggtt_node_remove_balloon(struct xe_ggtt_node *node);
>> +void xe_ggtt_shift_mm_nodes(struct xe_ggtt *ggtt, s64 shift);
>>   
>>   int xe_ggtt_node_insert(struct xe_ggtt_node *node, u32 size, u32 align);
>>   int xe_ggtt_node_insert_locked(struct xe_ggtt_node *node,
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>> index 9edbe34f45f4..e9e7ddeb4254 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>> @@ -415,6 +415,7 @@ static int vf_get_ggtt_info(struct xe_gt *gt)
>>   	xe_gt_sriov_dbg_verbose(gt, "GGTT %#llx-%#llx = %lluK\n",
>>   				start, start + size - 1, size / SZ_1K);
>>   
>> +	config->ggtt_shift = start - (s64)config->ggtt_base;
> on the first probe, shouldn't we store it as '0' ?

`ggtt_shift` is a part of`xe_gt`, allocated in `xe_gt_alloc()` using `kzalloc` variant. It is 
always zero on probe, regardless if first or not. But even regardless of 
that, the value of this parameter is valid only when within the 
post-migration recovery worker. It is not accessed outside, and noone 
cares about its value outside.

>>   	config->ggtt_base = start;
>>   	config->ggtt_size = size;
>>   
>> @@ -972,6 +973,88 @@ int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt)
>>   	return err;
>>   }
> hmm, are you sure that place between
>
> 	xe_gt_sriov_vf_query_runtime
> and
> 	vf_runtime_reg_cmp
>
> is the best place for this function?
>
> maybe at least place it closer to
>
> 	xe_gt_sriov_vf_migrated_event_handler
>
> that is at least partially related,
> but not sure if also not miss-located
no problem, will place it there.
>>   
>> +/**
>> + * xe_gt_sriov_vf_fixup_ggtt_nodes - Shift GGTT allocations to match assigned range.
>> + * @gt: the &xe_gt struct instance
>> + * Return: True if fixups are necessary
>> + *
>> + * Since Global GTT is not virtualized, each VF has an assigned range
>> + * within the global space. This range might have changed during migration,
>> + * which requires all memory addresses pointing to GGTT to be shifted.
> This 'longer description' shall be _after_ 'Return' description
>
> seehttps://docs.kernel.org/doc-guide/kernel-doc.html#function-documentation
I see from the link than you meant `Return:` at the end. But I agreed to 
your comments below so the return vanishes completely.
>> + */
>> +bool xe_gt_sriov_vf_fixup_ggtt_nodes(struct xe_gt *gt)
>> +{
>> +	struct xe_gt_sriov_vf_selfconfig *config = &gt->sriov.vf.self_config;
> you should introduce xe_gt_sriov_vf_ggtt_shift() earlier than patch 4/4
> and use it here (or better let the caller use it and pass ggtt_shift as
> parameter)
will move that function earlier.
>> +	struct xe_tile *tile = gt_to_tile(gt);
>> +	struct xe_ggtt *ggtt = tile->mem.ggtt;
>> +	s64 ggtt_shift;
>> +
> assert that gt is not a media one
ok
>> +	mutex_lock(&ggtt->lock);
>> +	ggtt_shift = config->ggtt_shift;
>> +	/*
>> +	 * Move nodes, including balloons, from range previously assigned to this VF,
>> +	 * into newly provisioned area.
>> +	 *
>> +	 * The balloon nodes are there to eliminate unavailable ranges from use: one
>> +	 * reserves the GGTT area below the range for current VF, and another one
>> +	 * reserves area above.
>> +	 *
>> +	 * Below is a GGTT layout of example VF, with a certain address range assigned to
>> +	 * said VF, and inaccessible areas above and below:
>> +	 *
>> +	 *  0                                                                        4GiB
>> +	 *  |<--------------------------- Total GGTT size ----------------------------->|
>> +	 *      WOPCM                                                         GUC_TOP
>> +	 *      |<-------------- Area mappable by xe_ggtt instance ---------------->|
>> +	 *
>> +	 *  +---+---------------------------------+----------+----------------------+---+
>> +	 *  |\\\|/////////////////////////////////|  VF mem  |//////////////////////|\\\|
>> +	 *  +---+---------------------------------+----------+----------------------+---+
>> +	 *
>> +	 * Hardware enforced access rules before migration:
>> +	 *
>> +	 *  |<------- inaccessible for VF ------->|<VF owned>|<-- inaccessible for VF ->|
>> +	 *
>> +	 * drm_mm nodes used for tracking allocations:
> use of drm_mm is implementation detail of the xe_ggtt and it is not
> relevant here, just say 'GGTT nodes'

We are trying to hide implementation details. Not because these details 
can vary/change - this will never happen.

We're just calling some things implementation details and hiding them 
because it is doable.

We've talked about this before, so will disagree but commit.

>> +	 *
>> +	 *     |<----------- balloon ------------>|<- nodes->|<----- balloon ------>|
>> +	 *
>> +	 * After the migration, GGTT area assigned to the VF might have shifted, either
>> +	 * to lower or to higher address. But we expect the total size and extra areas to
>> +	 * be identical, as migration can only happen between matching platforms.
>> +	 * Below is an example of GGTT layout of the VF after migration. Content of the
>> +	 * GGTT for VF has been moved to a new area, and we receive its address from GuC:
>> +	 *
>> +	 *  +---+----------------------+----------+---------------------------------+---+
>> +	 *  |\\\|//////////////////////|  VF mem  |/////////////////////////////////|\\\|
>> +	 *  +---+----------------------+----------+---------------------------------+---+
>> +	 *
>> +	 * Hardware enforced access rules after migration:
>> +	 *
>> +	 *  |<- inaccessible for VF -->|<VF owned>|<------- inaccessible for VF ------->|
>> +	 *
>> +	 * So the VF has a new slice of GGTT assigned, and during migration process, the
>> +	 * memory content was copied to that new area. But the drm_mm nodes within xe kmd
>> +	 * are still tracking allocations using the old addresses. The nodes within VF
>> +	 * owned area have to be shifted, and balloon nodes need to be resized to
>> +	 * properly mask out areas not owned by the VF.
>> +	 *
>> +	 * Fixed drm_mm nodes used for tracking allocations:
>> +	 *
>> +	 *     |<------ balloon ------>|<- nodes->|<----------- balloon ----------->|
>> +	 *
>> +	 * Due to use of GPU profiles, we do not expect the old and new GGTT ares to
>> +	 * overlap; but our node shifting will fix addresses properly regardless.
>> +	 */
> this inline comment is now much bigger than actual implementation
>
> why not promote it to the full DOC: as then it could be included in the
> master SRIOV documentation

will do.

..but for this case only. I will oppose to any tries of having the 
architecture docs rewritten as code comments.

>> +	if (ggtt_shift) {
>> +		xe_gt_sriov_vf_deballoon_ggtt(gt);
>> +		xe_ggtt_shift_mm_nodes(ggtt, ggtt_shift);
>> +		xe_gt_sriov_vf_balloon_ggtt(gt);
>> +	}
>> +	mutex_unlock(&ggtt->lock);
>> +	return ggtt_shift != 0;
>> +}
>> +
>>   static int vf_runtime_reg_cmp(const void *a, const void *b)
>>   {
>>   	const struct vf_runtime_reg *ra = a;
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>> index c87b0e9c7ebc..13c04e313aa6 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>> @@ -20,6 +20,7 @@ int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt);
>>   int xe_gt_sriov_vf_prepare_ggtt(struct xe_gt *gt);
>>   int xe_gt_sriov_vf_balloon_ggtt(struct xe_gt *gt);
>>   void xe_gt_sriov_vf_deballoon_ggtt(struct xe_gt *gt);
>> +bool xe_gt_sriov_vf_fixup_ggtt_nodes(struct xe_gt *gt);
>>   int xe_gt_sriov_vf_notify_resfix_done(struct xe_gt *gt);
>>   void xe_gt_sriov_vf_migrated_event_handler(struct xe_gt *gt);
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>> index a57f13b5afcd..5ccbdf8d08b6 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>> @@ -40,6 +40,8 @@ struct xe_gt_sriov_vf_selfconfig {
>>   	u64 ggtt_base;
>>   	/** @ggtt_size: assigned size of the GGTT region. */
>>   	u64 ggtt_size;
>> +	/** @ggtt_shift: difference in ggtt_base on last migration */
>> +	s64 ggtt_shift;
>>   	/** @lmem_size: assigned size of the LMEM. */
>>   	u64 lmem_size;
>>   	/** @num_ctxs: assigned number of GuC submission context IDs. */
>> diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.c b/drivers/gpu/drm/xe/xe_sriov_vf.c
>> index c1275e64aa9c..2eb6b8d8a217 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov_vf.c
>> +++ b/drivers/gpu/drm/xe/xe_sriov_vf.c
>> @@ -7,6 +7,7 @@
>>   
>>   #include "xe_assert.h"
>>   #include "xe_device.h"
>> +#include "xe_gt.h"
>>   #include "xe_gt_sriov_printk.h"
>>   #include "xe_gt_sriov_vf.h"
>>   #include "xe_pm.h"
>> @@ -170,6 +171,20 @@ static bool vf_post_migration_imminent(struct xe_device *xe)
>>   	work_pending(&xe->sriov.vf.migration.worker);
>>   }
>>   
>> +static bool vf_post_migration_fixup_ggtt_nodes(struct xe_device *xe)
>> +{
>> +	struct xe_tile *tile;
>> +	unsigned int id;
>> +	bool need_fixups = false;
> try to define vars in reverse-xmas-tree order
>
> 	bool need_fixups = false;
> 	struct xe_tile *tile;
> 	unsigned int id;
ok
>> +
>> +	for_each_tile(tile, xe, id) {
>> +		struct xe_gt *gt = tile->primary_gt;
> 		shift = xe_gt_sriov_vf_ggtt_shift(gt);
> 		if (shift) {
> 			need_fixups = true;
> 			xe_gt_sriov_vf_fixup_ggtt_nodes(gt, shift);
> 		}

So in this specific place we want to make the function look longer and 
reveal a detail.

Makes no sense to me, but since you don't seem convinced by previous 
review round - will do this.

-Tomasz

>> +
>> +		need_fixups |= xe_gt_sriov_vf_fixup_ggtt_nodes(gt);
>> +	}
>> +	return need_fixups;
>> +}
>> +
>>   /*
>>    * Notify all GuCs about resource fixups apply finished.
>>    */
>> @@ -191,6 +206,7 @@ static void vf_post_migration_notify_resfix_done(struct xe_device *xe)
>>   
>>   static void vf_post_migration_recovery(struct xe_device *xe)
>>   {
>> +	bool need_fixups;
>>   	int err;
>>   
>>   	drm_dbg(&xe->drm, "migration recovery in progress\n");
>> @@ -201,6 +217,7 @@ static void vf_post_migration_recovery(struct xe_device *xe)
>>   	if (unlikely(err))
>>   		goto fail;
>>   
>> +	need_fixups = vf_post_migration_fixup_ggtt_nodes(xe);
>>   	/* FIXME: add the recovery steps */
>>   	vf_post_migration_notify_resfix_done(xe);
>>   	xe_pm_runtime_put(xe);
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20250403/359e08f5/attachment-0001.htm>