Re: ✗ Xe.CI.Full: failure for SR-IOV promotions

Wed Jul 23 14:38:28 UTC 2025


On 7/23/2025 3:25 AM, Patchwork wrote:
> *Patch Details*
> *Series:*	SR-IOV promotions
> *URL:*	https://patchwork.freedesktop.org/series/151961/ <https://patchwork.freedesktop.org/series/151961/>
> *State:*	failure
> *Details:*	https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-151961v1/index.html <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-151961v1/index.html>
> 
> 
>   CI Bug Log - changes from xe-3458-1bb52fd2f94e5c605adf88a086ab0eadcdd708e3_FULL -> xe-pw-151961v1_FULL
> 
> 
>     Summary
> 
> *FAILURE*
> 
> Serious unknown changes coming with xe-pw-151961v1_FULL absolutely need to be
> verified manually.
> 
> If you think the reported changes have nothing to do with the changes
> introduced in xe-pw-151961v1_FULL, please notify your bug team (I915-ci-infra at lists.freedesktop.org) to allow them
> to document this new failure mode, which will reduce false positives in CI.
> 
> 
>     Participating hosts (4 -> 4)
> 
> No changes in participating hosts
> 
> 
>     Possible new issues
> 
> Here are the unknown changes that may have been introduced in xe-pw-151961v1_FULL:
> 
> 
>       IGT changes
> 
> 
>         Possible regressions
> 
>   *
> 
>     igt at xe_exec_threads@threads-cm-fd-userptr-invalidate:
> 
>       o shard-adlp: PASS <https://intel-gfx-ci.01.org/tree/intel-xe/xe-3458-1bb52fd2f94e5c605adf88a086ab0eadcdd708e3/shard-adlp-3/igt@xe_exec_threads@threads-cm-fd-userptr-invalidate.html> -> FAIL <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-151961v1/shard-adlp-1/igt@xe_exec_threads@threads-cm-fd-userptr-invalidate.html> +1 other test fail

unrelated, looks like sporadic

<7> [301.708094] xe 0000:00:02.0: [drm:xe_guc_exec_queue_memory_cat_error_handler [xe]] GT0: Engine memory CAT error: class=vcs, logical_mask: 0x1, guc_id=6
<6> [301.708849] xe 0000:00:02.0: [drm] GT0: Engine reset: engine_class=vcs, logical_mask: 0x1, guc_id=6
<7> [301.708957] xe 0000:00:02.0: [drm:xe_devcoredump [xe]] Multiple hangs are occurring, but only the first snapshot was taken

>   *
> 
>     igt at xe_fault_injection@inject-fault-probe-function-xe_guc_log_init:
> 
>       o shard-bmg: PASS <https://intel-gfx-ci.01.org/tree/intel-xe/xe-3458-1bb52fd2f94e5c605adf88a086ab0eadcdd708e3/shard-bmg-2/igt@xe_fault_injection@inject-fault-probe-function-xe_guc_log_init.html> -> ABORT <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-151961v1/shard-bmg-8/igt@xe_fault_injection@inject-fault-probe-function-xe_guc_log_init.html>
> 

unrelated, looks like sporadic, previously tracked as Xe Bug# 3084

<4> [265.225514] pci 0000:03:00.0: [drm] Assertion `ret` failed!
platform: BATTLEMAGE subplatform: 1
graphics: Xe2_HPG 20.01 step A0
media: Xe2_HPM 13.01 step A1
tile: 0 VRAM 12.0 GiB
GT: 0 type 1
<4> [265.225582] WARNING: CPU: 0 PID: 8409 at drivers/gpu/drm/xe/xe_guc_submit.c:242 guc_submit_fini+0x2b6/0x2d0 [xe]

...

<3> [265.227376] pci 0000:03:00.0: [drm] *ERROR* GT0: GUC ID manager unclean (1/65535)
<6> [265.228262] pci 0000:03:00.0: [drm] GT0: 	total 65535
<6> [265.228272] pci 0000:03:00.0: [drm] GT0: 	used 1
<6> [265.228278] pci 0000:03:00.0: [drm] GT0: 	range 5..5 (1)

...

<3> [265.233717] [drm:drm_mm_takedown] *ERROR* node [0070d000 + 00007000]: inserted at
 drm_mm_insert_node_in_range+0x2b4/0x520
 __xe_ggtt_insert_bo_at+0x159/0x4b0 [xe]
 xe_ggtt_insert_bo+0x17/0x30 [xe]
 __xe_bo_create_locked+0x2a5/0x620 [xe]
 xe_bo_create_pin_map_at_aligned+0x49/0x1e0 [xe]
 xe_bo_create_pin_map+0x1c/0x30 [xe]
 xe_lrc_create+0x178/0x1990 [xe]
 xe_exec_queue_create+0x339/0x520 [xe]
 xe_exec_queue_create_ioctl+0xb2f/0xc90 [xe]

> 
>         Warnings
> 
>   * igt at xe_pm@s2idle-d3cold-basic-exec:
>       o shard-adlp: SKIP <https://intel-gfx-ci.01.org/tree/intel-xe/xe-3458-1bb52fd2f94e5c605adf88a086ab0eadcdd708e3/shard-adlp-3/igt@xe_pm@s2idle-d3cold-basic-exec.html> (Intel XE#2284 <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2284> / Intel XE#366 <https://gitlab.freedesktop.org/drm/xe/kernel/issues/366>) -> ABORT <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-151961v1/shard-adlp-1/igt@xe_pm@s2idle-d3cold-basic-exec.html>

unrelated, looks that we lost MMIO

<3> [321.997510] xe 0000:00:02.0: [drm] *ERROR* GT0: GuC mmio request 0x5507: no reply 0xffff
<3> [321.998641] xe 0000:00:02.0: [drm] *ERROR* GT0: GuC suspend failed: -ETIMEDOUT
<3> [321.998721] xe 0000:00:02.0: [drm] *ERROR* GT0: suspend failed (-ETIMEDOUT)
<3> [321.999159] xe 0000:00:02.0: can't suspend (xe_pci_runtime_suspend [xe] returned -110)