[PATCH v2] drm/xe: Improve VF provision stability with fault injection

Satyanarayana K V P satyanarayana.k.v.p at intel.com
Wed Jul 23 11:13:25 UTC 2025


In unlikely event, due to PF malfunction or misconfiguration, VF may
receive incomplete or invalid configuration and it must be prepared
to handle such cases without causing a crash.

When simulating errors with the kernel's fault injection framework, crashes
were observed during device unbind. These crashes were primarily due to the
use of the XE_BO_FLAG_GGTT_INVALIDATE flag when creating buffer objects.

The GGTT is invalidated using CTB, which is allocated with the
XE_BO_FLAG_GGTT_INVALIDATE flag. However, the buffer object for CTB is
freed before GGTT invalidation completes, leading to crashes.

Similarly, for buffer objects allocated in memirq_alloc_pages() and
__xe_sa_bo_manager_init(), the CTB is already freed by the time GGTT
invalidation occurs, resulting in system crashes.

To prevent these issues, while invalidating ggtt, an additional check for
validity of ggtt is added which avoids sending CTB to GUC.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p at intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
Cc: Matthew Brost <matthew.brost at intel.com>
Cc: Matthew Auld <matthew.auld at intel.com>
Cc: Summers Stuart <stuart.summers at intel.com>

---
V1 -> V2:
- An additional check for validity of ggtt is added in
xe_gt_tlb_invalidation_ggtt() instead of removing XE_BO_FLAG_GGTT_INVALIDATE
from memirq_alloc_pages(), __xe_sa_bo_manager_init() and xe_guc_ct_init().
(Summers Stuart)
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 086c12ee3d9d..145b5f90e347 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -294,10 +294,12 @@ static int xe_gt_tlb_invalidation_guc(struct xe_gt *gt,
  */
 int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt)
 {
+	struct xe_tile *tile = gt_to_tile(gt);
 	struct xe_device *xe = gt_to_xe(gt);
 	unsigned int fw_ref;
 
-	if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
+	if (tile->mem.ggtt->scratch &&
+	    xe_guc_ct_enabled(&gt->uc.guc.ct) &&
 	    gt->uc.guc.submission_state.enabled) {
 		struct xe_gt_tlb_invalidation_fence fence;
 		int ret;
-- 
2.43.0



More information about the Intel-xe mailing list