[RFC PATCH 0/1] Add driver load error injection
Matthew Brost
matthew.brost at intel.com
Fri Aug 9 22:44:23 UTC 2024
Start porting over driver load error injectin from the i915. Eventually
idea would be make this error injection a bit more generic (drm level,
or kernel level) but to ensure a stable driver starting with the i915
implementation.
Not complete as many more injection points need to be added.
Can be tested with:
for i in {1..200}; do echo "Run $i"; modprobe xe inject_driver_load_error=$i; rmmod xe; done
Will need to a version of this series [1] to avoid lockdep turning off
after 30ish module loads.
Kernel is currently blowing up on injection point #11 on TGL w/o
display, will need to start debug their. Stack trace below.
[ 196.326118] Setting dangerous option inject_driver_load_error - tainting kernel
[ 196.328408] xe 0000:00:02.0: vgaarb: deactivate vga console
[ 196.328975] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] TIGERLAKE 9a49:0001 dgfx:0 gfx:Xe_LP (12.00) media:Xe_M (12.00) display:no dma_m_s:39 tc:1 gscfi:0 cscfi:0
[ 196.329016] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] Stepping = (G:B0, M:B0, D:D0, B:**)
[ 196.329039] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] SR-IOV support: no (mode: none)
[ 196.330746] xe 0000:00:02.0: [drm] Using GuC firmware from i915/tgl_guc_70.bin version 70.30.0
[ 196.331047] xe 0000:00:02.0: [drm] Injecting failure -19 at checkpoint 11 [xe_guc_log_init:98]
[ 196.331050] xe 0000:00:02.0: [drm] *ERROR* GT0: GuC init failed with -ENODEV
[ 196.338208] xe 0000:00:02.0: [drm] *ERROR* GT0: Failed to initialize uC (-ENODEV)
[ 196.347009] BUG: unable to handle page fault for address: 000000000000a188
[ 196.353903] #PF: supervisor write access in kernel mode
[ 196.359138] #PF: error_code(0x0002) - not-present page
[ 196.364289] PGD 0 P4D 0
[ 196.366842] Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 196.371735] CPU: 6 UID: 0 PID: 1233 Comm: modprobe Tainted: G U 6.11.0-rc2-xe+ #3796
[ 196.380875] Tainted: [U]=USER
[ 196.383857] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.3243.A01.2006102133 06/10/2020
[ 196.397237] RIP: 0010:xe_mmio_write32+0x67/0x290 [xe]
[ 196.402332] Code: 48 0f a3 05 c3 c9 5b e2 0f 82 c6 00 00 00 45 89 e6 41 c1 ee 18 41 f7 c4 00 00 00 40 74 7f 45 84 f6 78 74 49 8b 47 28 48 01 c3 <44> 89 2b 48 83 c4 58 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc
[ 196.421085] RSP: 0018:ffffc9000152b820 EFLAGS: 00010006
[ 196.426322] RAX: 0000000000000000 RBX: 000000000000a188 RCX: 0000000000000000
[ 196.433466] RDX: 0000000000010001 RSI: ffffffff82426f19 RDI: ffffffff824343c6
[ 196.440608] RBP: ffff888152678028 R08: 00000000000d6398 R09: 0000000000000001
[ 196.447748] R10: 00000000ffffffff R11: ffff888152628000 R12: 000000000000a188
[ 196.454893] R13: 0000000000010001 R14: 0000000000000000 R15: ffff88815262a308
[ 196.462037] FS: 00007ff3ae103000(0000) GS:ffff88849fb80000(0000) knlGS:0000000000000000
[ 196.470137] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 196.475893] CR2: 000000000000a188 CR3: 0000000156d2a004 CR4: 0000000000f70ef0
[ 196.483036] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 196.490177] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 196.497323] PKRU: 55555554
[ 196.500051] Call Trace:
[ 196.502517] <TASK>
[ 196.504640] ? __die+0x1f/0x70
[ 196.507719] ? page_fault_oops+0x155/0x470
[ 196.511831] ? stack_trace_save+0x49/0x70
[ 196.515861] ? do_user_addr_fault+0x63/0x720
[ 196.520151] ? exc_page_fault+0x63/0x1d0
[ 196.524091] ? asm_exc_page_fault+0x26/0x30
[ 196.528293] ? xe_mmio_write32+0x67/0x290 [xe]
[ 196.532777] xe_force_wake_get+0xc8/0x2b0 [xe]
[ 196.537260] ? lock_acquire+0xcd/0x300
[ 196.541031] xe_gt_tlb_invalidation_ggtt+0xa8/0x310 [xe]
[ 196.546380] ? rcu_is_watching+0x11/0x50
[ 196.550322] ? __mutex_lock+0x12f/0xd70
[ 196.554179] ? find_held_lock+0x2b/0x80
[ 196.558031] ? xe_ggtt_remove_node+0xbf/0xf0 [xe]
[ 196.562772] xe_ggtt_invalidate+0x19/0x80 [xe]
[ 196.567251] xe_ggtt_remove_node+0xdf/0xf0 [xe]
[ 196.571818] xe_ttm_bo_destroy+0x11a/0x220 [xe]
[ 196.576388] drm_managed_release+0xb0/0x160
[ 196.580593] devm_drm_dev_init_release+0x54/0x70
[ 196.585232] release_nodes+0x2e/0xf0
[ 196.588827] devres_release_all+0x8a/0xc0
[ 196.592858] device_unbind_cleanup+0x9/0x70
[ 196.597058] really_probe+0x1a0/0x380
[ 196.600740] __driver_probe_device+0x73/0x150
[ 196.605108] driver_probe_device+0x19/0x90
[ 196.609222] __driver_attach+0xd5/0x1d0
[ 196.613073] ? __pfx___driver_attach+0x10/0x10
[ 196.617534] bus_for_each_dev+0x77/0xd0
[ 196.621389] bus_add_driver+0x110/0x240
[ 196.625238] driver_register+0x5b/0x110
[ 196.629086] xe_init+0x3b/0x80 [xe]
[ 196.632615] ? __pfx_xe_init+0x10/0x10 [xe]
[ 196.636829] do_one_initcall+0x5e/0x2b0
[ 196.640683] ? rcu_is_watching+0x11/0x50
[ 196.644622] ? __kmalloc_cache_noprof+0x24e/0x2f0
[ 196.649343] do_init_module+0x5f/0x210
[ 196.653113] init_module_from_file+0x86/0xd0
[ 196.657402] idempotent_init_module+0x17c/0x230
[ 196.661946] __x64_sys_finit_module+0x59/0xb0
[ 196.666323] do_syscall_64+0x68/0x140
[ 196.670006] entry_SYSCALL_64_after_hwframe+0x76/0x7e
Matt
[1] https://patchwork.freedesktop.org/series/136701/
Matthew Brost (1):
drm/xe: Add driver load error injection
drivers/gpu/drm/xe/xe_device.c | 31 ++++++++++++++++++++++++++++
drivers/gpu/drm/xe/xe_device.h | 15 ++++++++++++++
drivers/gpu/drm/xe/xe_device_types.h | 4 ++++
drivers/gpu/drm/xe/xe_gt.c | 5 +++++
drivers/gpu/drm/xe/xe_gt_sriov_pf.c | 4 ++++
drivers/gpu/drm/xe/xe_guc.c | 8 +++++++
drivers/gpu/drm/xe/xe_guc_ads.c | 5 +++++
drivers/gpu/drm/xe/xe_guc_ct.c | 4 ++++
drivers/gpu/drm/xe/xe_guc_log.c | 5 +++++
drivers/gpu/drm/xe/xe_mmio.c | 5 +++++
drivers/gpu/drm/xe/xe_module.c | 5 +++++
drivers/gpu/drm/xe/xe_module.h | 3 +++
drivers/gpu/drm/xe/xe_pci.c | 9 ++++++++
drivers/gpu/drm/xe/xe_pm.c | 8 +++++++
drivers/gpu/drm/xe/xe_sriov.c | 8 ++++++-
drivers/gpu/drm/xe/xe_tile.c | 4 ++++
drivers/gpu/drm/xe/xe_uc.c | 4 ++++
drivers/gpu/drm/xe/xe_wa.c | 5 +++++
drivers/gpu/drm/xe/xe_wopcm.c | 4 ++++
19 files changed, 135 insertions(+), 1 deletion(-)
--
2.34.1
More information about the Intel-xe
mailing list