<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 17.04.2025 20:50, Michal Wajdeczko
wrote:<br>
</div>
<blockquote type="cite" cite="mid:21c3c6ab-701b-427b-a705-975bcb68b5d3@intel.com">
<pre wrap="" class="moz-quote-pre">
On 16.04.2025 00:08, Lis, Tomasz wrote:
</pre>
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">
On 14.04.2025 11:23, Michal Wajdeczko wrote:
</pre>
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">On 11.04.2025 22:36, Tomasz Lis wrote:
</pre>
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">We have only one GGTT for all IOV functions, with each VF having
assigned
a range of addresses for its use. After migration, a VF can receive a
different range of addresses than it had initially.
This implements shifting GGTT addresses within drm_mm nodes, so that
VMAs stay valid after migration. This will make the driver use new
addresses when accessing GGTT from the moment the shifting ends.
By taking the ggtt->lock for the period of VMA fixups, this change
also adds constraint on that mutex. Any locks used during the recovery
cannot ever wait for hardware response - because after migration,
the hardware will not do anything until fixups are finished.
v2: Moved some functs to xe_ggtt.c; moved shift computation to just
after querying; improved documentation; switched some warns to
asserts;
skipping fixups when GGTT shift eq 0; iterating through tiles
(Michal)
v3: Updated kerneldocs, removed unused funct, properly allocate
balloning nodes if non existent
v4: Re-used ballooning functions from VF init, used bool in place of
standard error codes
v5: Renamed one function
v6: Subject tag change, several kerneldocs updated, some functions
renamed, some moved, added several asserts, shuffled declarations
of variables, revealed more detail in high level functions
v7: Fixed typos, added `_locked` suffix to some functs, improved
readability of asserts, removed unneeded conditional
v8: Moved one function, removed implementation detail from kerneldoc,
added asserts
Signed-off-by: Tomasz Lis<a class="moz-txt-link-rfc2396E" href="mailto:tomasz.lis@intel.com"><tomasz.lis@intel.com></a>
Cc: Michal Wajdeczko<a class="moz-txt-link-rfc2396E" href="mailto:michal.wajdeczko@intel.com"><michal.wajdeczko@intel.com></a>
---
drivers/gpu/drm/xe/xe_ggtt.c | 48 ++++++++++
drivers/gpu/drm/xe/xe_ggtt.h | 1 +
drivers/gpu/drm/xe/xe_gt_sriov_vf.c | 102 ++++++++++++++++++++++
drivers/gpu/drm/xe/xe_gt_sriov_vf.h | 2 +
drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h | 2 +
drivers/gpu/drm/xe/xe_sriov_vf.c | 22 +++++
6 files changed, 177 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index 5a37233f2420..9fc0d57be00a 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -484,6 +484,54 @@ void xe_ggtt_node_remove_balloon_locked(struct
xe_ggtt_node *node)
drm_mm_remove_node(&node->base);
}
+static void xe_ggtt_assert_fit(struct xe_ggtt *ggtt, u64 start,
u64 size)
+{
+ struct xe_tile *tile = ggtt->tile;
+ struct xe_device *xe = tile_to_xe(tile);
+ u64 __maybe_unused wopcm = xe_wopcm_size(xe);
+
+ xe_tile_assert(tile, start >= wopcm);
+ xe_tile_assert(tile, start + size < ggtt->size - wopcm);
+}
+
+/**
+ * xe_ggtt_shift_nodes_locked - Shift GGTT nodes to adjust for a
change in usable address range.
+ * @ggtt: the &xe_ggtt struct instance
+ * @shift: change to the location of area provisioned for current VF
+ *
+ * This function moves all nodes from the GGTT VM, to a temp list.
These nodes are expected
+ * to represent allocations in range formerly assigned to current
VF, before the range changed.
+ * When the GGTT VM is completely clear of any nodes, they are re-
added with shifted offsets.
+ *
+ * The function has no ability of failing - because it shifts
existing nodes, without
+ * any additional processing. If the nodes were successfully
existing at the old address,
+ * they will do the same at the new one. A fail inside this function
would indicate that
+ * the list of nodes was either already damaged, or that the shift
brings the address range
+ * outside of valid bounds. Both cases justify an assert rather than
error code.
+ */
+void xe_ggtt_shift_nodes_locked(struct xe_ggtt *ggtt, s64 shift)
+{
+ struct xe_tile *tile __maybe_unused = ggtt->tile;
+ struct drm_mm_node *node, *tmpn;
+ LIST_HEAD(temp_list_head);
+ int err;
+
+ lockdep_assert_held(&ggtt->lock);
+
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">nit: my idea was to assert that we have room in the GGTT for that shift
ahead of doing any movements and if there is no easy way to get that
from the drm_mm is one call then do it like:
drm_mm_for_each_node_safe(node, tmpn, &ggtt->mm)
xe_ggtt_assert_fit(ggtt, node->start + shift, node->size);
and then follow with pure code that we know it must be ok
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">ok, can do that. Though this does not look better in any way to me. In
fact it looks worse, as it introduces another iteration through the
nodes which optimizations most likely won't be able to cut out. I
thought we should avoid debug-only iterating.
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">
if compiler can't full optimize out the loop then we can do it with:
if (IS_ENABLED(CONFIG_DRM_XE_DEBUG))
drm_mm_for_each_node_safe(...)
xe_ggtt_assert_fit(...);</pre>
</blockquote>
will do that.<br>
<blockquote type="cite" cite="mid:21c3c6ab-701b-427b-a705-975bcb68b5d3@intel.com">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">+ drm_mm_for_each_node_safe(node, tmpn, &ggtt->mm) {
+ drm_mm_remove_node(node);
+ list_add(&node->node_list, &temp_list_head);
+ }
+
+ list_for_each_entry_safe(node, tmpn, &temp_list_head, node_list) {
+ xe_ggtt_assert_fit(ggtt, node->start + shift, node->size);
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">so without this assert here (as we did it in a separate loop above)
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">ok
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">+ list_del(&node->node_list);
+ node->start += shift;
+ err = drm_mm_reserve_node(&ggtt->mm, node);
+ xe_tile_assert(tile, !err);
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">maybe drop the 'err' assignment and just confirm that:
xe_tile_assert(tile, xe_ggtt_node_allocated(node));
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">
The proposed solution is again worse than original. Ignoring an error
return should be avoided. There is no reason to ignore it here.
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">
but we don't ignore the error, we just rely on node_allocated() instead
of looking directly at err, which in non-debug build we will just assign
and then completely ignore, risking static code analyzer to complain</pre>
</blockquote>
right, I did forgot __maybe_unused.<br>
<blockquote type="cite" cite="mid:21c3c6ab-701b-427b-a705-975bcb68b5d3@intel.com">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">
Also here we are not iterating through xe wrappers of drm_mm nodes. I
will stick to current implementation.
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">
you can still use drm_mm_node_allocated(node)
</pre>
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">
Like the one before, all this request would do is generate changes. If
solutions have almost equal weights, maybe it's worth to just stick with
what the author proposed.
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">
'almost' makes a difference
usually review comments helps author to wider his/her view, but it looks
the some authors are just too resistant for different views, and NAK
them without even trying to understand them</pre>
</blockquote>
<p>Presenting my 'almost' as 'slightly better' is at odds to the
arguments I provided. The proposed solution is actually slightly
worse.</p>
<p>But even if it was the other side of the 'almost', this doesn't
change my point which is the opposite of '<span style="white-space: pre-wrap">makes a difference<span style="white-space: normal">'.</span></span></p>
<p>But keeping true to my words - the difference is so irrelevant it
is not worth discussing, and the change isn't very complex - will
just do it.<br>
</p>
<blockquote type="cite" cite="mid:21c3c6ab-701b-427b-a705-975bcb68b5d3@intel.com">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">+ }
+}
+
/**
* xe_ggtt_node_insert_locked - Locked version to insert a
&xe_ggtt_node into the GGTT
* @node: the &xe_ggtt_node to be inserted
diff --git a/drivers/gpu/drm/xe/xe_ggtt.h b/drivers/gpu/drm/xe/xe_ggtt.h
index d468af96b465..4337a279ff3b 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.h
+++ b/drivers/gpu/drm/xe/xe_ggtt.h
@@ -18,6 +18,7 @@ void xe_ggtt_node_fini(struct xe_ggtt_node *node);
int xe_ggtt_node_insert_balloon_locked(struct xe_ggtt_node *node,
u64 start, u64 size);
void xe_ggtt_node_remove_balloon_locked(struct xe_ggtt_node *node);
+void xe_ggtt_shift_nodes_locked(struct xe_ggtt *ggtt, s64 shift);
int xe_ggtt_node_insert(struct xe_ggtt_node *node, u32 size, u32
align);
int xe_ggtt_node_insert_locked(struct xe_ggtt_node *node,
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/
xe/xe_gt_sriov_vf.c
index fa299da08684..a822549169d8 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
@@ -415,6 +415,7 @@ static int vf_get_ggtt_info(struct xe_gt *gt)
xe_gt_sriov_dbg_verbose(gt, "GGTT %#llx-%#llx = %lluK\n",
start, start + size - 1, size / SZ_1K);
+ config->ggtt_shift = start - (s64)config->ggtt_base;
config->ggtt_base = start;
config->ggtt_size = size;
@@ -560,6 +561,24 @@ u64 xe_gt_sriov_vf_lmem(struct xe_gt *gt)
return gt->sriov.vf.self_config.lmem_size;
}
+/**
+ * xe_gt_sriov_vf_ggtt_shift - Return shift in GGTT range due to VF
migration
+ * @gt: the &xe_gt struct instance
+ *
+ * This function is for VF use only.
+ *
+ * Return: The shift value; could be negative
+ */
+s64 xe_gt_sriov_vf_ggtt_shift(struct xe_gt *gt)
+{
+ struct xe_gt_sriov_vf_selfconfig *config = >-
</pre>
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">sriov.vf.self_config;
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">+
+ xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
+ xe_gt_assert(gt, !xe_gt_is_media_type(gt));
+
+ return config->ggtt_shift;
+}
+
static int vf_init_ggtt_balloons(struct xe_gt *gt)
{
struct xe_tile *tile = gt_to_tile(gt);
@@ -817,6 +836,89 @@ int xe_gt_sriov_vf_connect(struct xe_gt *gt)
return err;
}
+/**
+ * DOC: GGTT nodes shifting during VF post-migration recovery
+ *
+ * The first fixup applied to the VF KMD structures as part of post-
migration
+ * recovery is shifting nodes within &xe_ggtt instance. The nodes
are moved
+ * from range previously assigned to this VF, into newly provisioned
area.
+ * The changes include balloons, which are resized accordingly.
+ *
+ * The balloon nodes are there to eliminate unavailable ranges from
use: one
+ * reserves the GGTT area below the range for current VF, and
another one
+ * reserves area above.
+ *
+ * Below is a GGTT layout of example VF, with a certain address
range assigned to
+ * said VF, and inaccessible areas above and below:
+ *
+ *
0 4GiB
+ * |<--------------------------- Total GGTT size
----------------------------->|
+ *
WOPCM GUC_TOP
+ * |<-------------- Area mappable by xe_ggtt instance
---------------->|
+ *
+ * +---+---------------------------------+----------
+----------------------+---+
+ * |\\\|/////////////////////////////////| VF mem
|//////////////////////|\\\|
+ * +---+---------------------------------+----------
+----------------------+---+
+ *
+ * Hardware enforced access rules before migration:
+ *
+ * |<------- inaccessible for VF ------->|<VF owned>|<--
inaccessible for VF ->|
+ *
+ * GGTT nodes used for tracking allocations:
+ *
+ * |<----------- balloon ------------>|<- nodes->|<----- balloon
------>|
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">left side is little misaligned to above WOPCM (it that was the idea)
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">true, will fix.
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">+ *
+ * After the migration, GGTT area assigned to the VF might have
shifted, either
+ * to lower or to higher address. But we expect the total size and
extra areas to
+ * be identical, as migration can only happen between matching
platforms.
+ * Below is an example of GGTT layout of the VF after migration.
Content of the
+ * GGTT for VF has been moved to a new area, and we receive its
address from GuC:
+ *
+ * +---+----------------------+----------
+---------------------------------+---+
+ * |\\\|//////////////////////| VF mem
|/////////////////////////////////|\\\|
+ * +---+----------------------+----------
+---------------------------------+---+
+ *
+ * Hardware enforced access rules after migration:
+ *
+ * |<- inaccessible for VF -->|<VF owned>|<------- inaccessible for
VF ------->|
+ *
+ * So the VF has a new slice of GGTT assigned, and during migration
process, the
+ * memory content was copied to that new area. But the &xe_ggtt
nodes are still
+ * tracking allocations using the old addresses. The nodes within VF
owned area
+ * have to be shifted, and balloon nodes need to be resized to
properly mask out
+ * areas not owned by the VF.
+ *
+ * Fixed &xe_ggtt nodes used for tracking allocations:
+ *
+ * |<------ balloon ------>|<- nodes->|<----------- balloon
----------->|
+ *
+ * Due to use of GPU profiles, we do not expect the old and new GGTT
ares to
+ * overlap; but our node shifting will fix addresses properly
regardless.
+ */
+
+/**
+ * xe_gt_sriov_vf_fixup_ggtt_nodes - Shift GGTT allocations to match
assigned range.
+ * @gt: the &xe_gt struct instance
+ * @ggtt_shift: the shift value
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">nit: function is about _GGTT_ so maybe we can drop ggtt_ prefix here?
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">True, since we already did that everywhere else.
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">+ *
+ * Since Global GTT is not virtualized, each VF has an assigned range
+ * within the global space. This range might have changed during
migration,
+ * which requires all memory addresses pointing to GGTT to be shifted.
+ */
+void xe_gt_sriov_vf_fixup_ggtt_nodes(struct xe_gt *gt, s64 ggtt_shift)
+{
+ struct xe_tile *tile = gt_to_tile(gt);
+ struct xe_ggtt *ggtt = tile->mem.ggtt;
+
+ xe_gt_assert(gt, !xe_gt_is_media_type(gt));
+
+ mutex_lock(&ggtt->lock);
+ xe_gt_sriov_vf_deballoon_ggtt_locked(gt);
+ xe_ggtt_shift_nodes_locked(ggtt, ggtt_shift);
+ xe_gt_sriov_vf_balloon_ggtt_locked(gt);
+ mutex_unlock(&ggtt->lock);
+}
+
/**
* xe_gt_sriov_vf_migrated_event_handler - Start a VF migration
recovery,
* or just mark that a GuC is ready for it.
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/
xe/xe_gt_sriov_vf.h
index d717deb8af91..6e0da81d0359 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
@@ -20,6 +20,8 @@ int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt);
int xe_gt_sriov_vf_prepare_ggtt(struct xe_gt *gt);
int xe_gt_sriov_vf_balloon_ggtt_locked(struct xe_gt *gt);
void xe_gt_sriov_vf_deballoon_ggtt_locked(struct xe_gt *gt);
+s64 xe_gt_sriov_vf_ggtt_shift(struct xe_gt *gt);
+void xe_gt_sriov_vf_fixup_ggtt_nodes(struct xe_gt *gt, s64 ggtt_shift);
int xe_gt_sriov_vf_notify_resfix_done(struct xe_gt *gt);
void xe_gt_sriov_vf_migrated_event_handler(struct xe_gt *gt);
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h b/drivers/
gpu/drm/xe/xe_gt_sriov_vf_types.h
index a57f13b5afcd..5ccbdf8d08b6 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
@@ -40,6 +40,8 @@ struct xe_gt_sriov_vf_selfconfig {
u64 ggtt_base;
/** @ggtt_size: assigned size of the GGTT region. */
u64 ggtt_size;
+ /** @ggtt_shift: difference in ggtt_base on last migration */
+ s64 ggtt_shift;
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">btw, do we want to expose this in self_config where we already show GGTT
base and size? if so then update xe_gt_sriov_vf_print_config()
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">
No, I don't think that's a good idea. It's just a derivative value valid
only during the recovery.
GGTT base and size from logged lines before and after can be compared to
get the shift.
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">
it's not about the math
relying on the dmesg with those lines being always available might be
just too optimistic, usually we want to able to collect data directly
from the driver/debugfs in case we detect a failure</pre>
</blockquote>
<p>hard to argue with that. will add the printing.</p>
<p>-Tomasz<br>
</p>
<blockquote type="cite" cite="mid:21c3c6ab-701b-427b-a705-975bcb68b5d3@intel.com">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">
-Tomasz
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre"> /** @lmem_size: assigned size of the LMEM. */
u64 lmem_size;
/** @num_ctxs: assigned number of GuC submission context IDs. */
diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.c b/drivers/gpu/drm/xe/
xe_sriov_vf.c
index c1275e64aa9c..e70f1ceabbb3 100644
--- a/drivers/gpu/drm/xe/xe_sriov_vf.c
+++ b/drivers/gpu/drm/xe/xe_sriov_vf.c
@@ -7,6 +7,7 @@
#include "xe_assert.h"
#include "xe_device.h"
+#include "xe_gt.h"
#include "xe_gt_sriov_printk.h"
#include "xe_gt_sriov_vf.h"
#include "xe_pm.h"
@@ -170,6 +171,25 @@ static bool vf_post_migration_imminent(struct
xe_device *xe)
work_pending(&xe->sriov.vf.migration.worker);
}
+static bool vf_post_migration_fixup_ggtt_nodes(struct xe_device *xe)
+{
+ bool need_fixups = false;
+ struct xe_tile *tile;
+ unsigned int id;
+
+ for_each_tile(tile, xe, id) {
+ struct xe_gt *gt = tile->primary_gt;
+ s64 shift;
+
+ shift = xe_gt_sriov_vf_ggtt_shift(gt);
+ if (shift) {
+ need_fixups = true;
+ xe_gt_sriov_vf_fixup_ggtt_nodes(gt, shift);
+ }
+ }
+ return need_fixups;
+}
+
/*
* Notify all GuCs about resource fixups apply finished.
*/
@@ -191,6 +211,7 @@ static void
vf_post_migration_notify_resfix_done(struct xe_device *xe)
static void vf_post_migration_recovery(struct xe_device *xe)
{
+ bool need_fixups;
int err;
drm_dbg(&xe->drm, "migration recovery in progress\n");
@@ -201,6 +222,7 @@ static void vf_post_migration_recovery(struct
xe_device *xe)
if (unlikely(err))
goto fail;
+ need_fixups = vf_post_migration_fixup_ggtt_nodes(xe);
/* FIXME: add the recovery steps */
vf_post_migration_notify_resfix_done(xe);
xe_pm_runtime_put(xe);
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">otherwise, LGTM
</pre>
</blockquote>
</blockquote>
<pre wrap="" class="moz-quote-pre">
</pre>
</blockquote>
</body>
</html>