[RFC PATCH 0/1] Add driver load error injection
Jani Nikula
jani.nikula at linux.intel.com
Tue Aug 13 10:47:26 UTC 2024
On Fri, 09 Aug 2024, Matthew Brost <matthew.brost at intel.com> wrote:
> Start porting over driver load error injectin from the i915. Eventually
> idea would be make this error injection a bit more generic (drm level,
> or kernel level) but to ensure a stable driver starting with the i915
> implementation.
>
> Not complete as many more injection points need to be added.
Please also bolt this into __i915_inject_probe_error() in
display/ext/i915_utils.c, exercising all the display error handling with
xe too.
BR,
Jani.
>
> Can be tested with:
> for i in {1..200}; do echo "Run $i"; modprobe xe inject_driver_load_error=$i; rmmod xe; done
>
> Will need to a version of this series [1] to avoid lockdep turning off
> after 30ish module loads.
>
> Kernel is currently blowing up on injection point #11 on TGL w/o
> display, will need to start debug their. Stack trace below.
>
> [ 196.326118] Setting dangerous option inject_driver_load_error - tainting kernel
> [ 196.328408] xe 0000:00:02.0: vgaarb: deactivate vga console
> [ 196.328975] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] TIGERLAKE 9a49:0001 dgfx:0 gfx:Xe_LP (12.00) media:Xe_M (12.00) display:no dma_m_s:39 tc:1 gscfi:0 cscfi:0
> [ 196.329016] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] Stepping = (G:B0, M:B0, D:D0, B:**)
> [ 196.329039] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] SR-IOV support: no (mode: none)
> [ 196.330746] xe 0000:00:02.0: [drm] Using GuC firmware from i915/tgl_guc_70.bin version 70.30.0
> [ 196.331047] xe 0000:00:02.0: [drm] Injecting failure -19 at checkpoint 11 [xe_guc_log_init:98]
> [ 196.331050] xe 0000:00:02.0: [drm] *ERROR* GT0: GuC init failed with -ENODEV
> [ 196.338208] xe 0000:00:02.0: [drm] *ERROR* GT0: Failed to initialize uC (-ENODEV)
> [ 196.347009] BUG: unable to handle page fault for address: 000000000000a188
> [ 196.353903] #PF: supervisor write access in kernel mode
> [ 196.359138] #PF: error_code(0x0002) - not-present page
> [ 196.364289] PGD 0 P4D 0
> [ 196.366842] Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
> [ 196.371735] CPU: 6 UID: 0 PID: 1233 Comm: modprobe Tainted: G U 6.11.0-rc2-xe+ #3796
> [ 196.380875] Tainted: [U]=USER
> [ 196.383857] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.3243.A01.2006102133 06/10/2020
> [ 196.397237] RIP: 0010:xe_mmio_write32+0x67/0x290 [xe]
> [ 196.402332] Code: 48 0f a3 05 c3 c9 5b e2 0f 82 c6 00 00 00 45 89 e6 41 c1 ee 18 41 f7 c4 00 00 00 40 74 7f 45 84 f6 78 74 49 8b 47 28 48 01 c3 <44> 89 2b 48 83 c4 58 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc
> [ 196.421085] RSP: 0018:ffffc9000152b820 EFLAGS: 00010006
> [ 196.426322] RAX: 0000000000000000 RBX: 000000000000a188 RCX: 0000000000000000
> [ 196.433466] RDX: 0000000000010001 RSI: ffffffff82426f19 RDI: ffffffff824343c6
> [ 196.440608] RBP: ffff888152678028 R08: 00000000000d6398 R09: 0000000000000001
> [ 196.447748] R10: 00000000ffffffff R11: ffff888152628000 R12: 000000000000a188
> [ 196.454893] R13: 0000000000010001 R14: 0000000000000000 R15: ffff88815262a308
> [ 196.462037] FS: 00007ff3ae103000(0000) GS:ffff88849fb80000(0000) knlGS:0000000000000000
> [ 196.470137] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 196.475893] CR2: 000000000000a188 CR3: 0000000156d2a004 CR4: 0000000000f70ef0
> [ 196.483036] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 196.490177] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 196.497323] PKRU: 55555554
> [ 196.500051] Call Trace:
> [ 196.502517] <TASK>
> [ 196.504640] ? __die+0x1f/0x70
> [ 196.507719] ? page_fault_oops+0x155/0x470
> [ 196.511831] ? stack_trace_save+0x49/0x70
> [ 196.515861] ? do_user_addr_fault+0x63/0x720
> [ 196.520151] ? exc_page_fault+0x63/0x1d0
> [ 196.524091] ? asm_exc_page_fault+0x26/0x30
> [ 196.528293] ? xe_mmio_write32+0x67/0x290 [xe]
> [ 196.532777] xe_force_wake_get+0xc8/0x2b0 [xe]
> [ 196.537260] ? lock_acquire+0xcd/0x300
> [ 196.541031] xe_gt_tlb_invalidation_ggtt+0xa8/0x310 [xe]
> [ 196.546380] ? rcu_is_watching+0x11/0x50
> [ 196.550322] ? __mutex_lock+0x12f/0xd70
> [ 196.554179] ? find_held_lock+0x2b/0x80
> [ 196.558031] ? xe_ggtt_remove_node+0xbf/0xf0 [xe]
> [ 196.562772] xe_ggtt_invalidate+0x19/0x80 [xe]
> [ 196.567251] xe_ggtt_remove_node+0xdf/0xf0 [xe]
> [ 196.571818] xe_ttm_bo_destroy+0x11a/0x220 [xe]
> [ 196.576388] drm_managed_release+0xb0/0x160
> [ 196.580593] devm_drm_dev_init_release+0x54/0x70
> [ 196.585232] release_nodes+0x2e/0xf0
> [ 196.588827] devres_release_all+0x8a/0xc0
> [ 196.592858] device_unbind_cleanup+0x9/0x70
> [ 196.597058] really_probe+0x1a0/0x380
> [ 196.600740] __driver_probe_device+0x73/0x150
> [ 196.605108] driver_probe_device+0x19/0x90
> [ 196.609222] __driver_attach+0xd5/0x1d0
> [ 196.613073] ? __pfx___driver_attach+0x10/0x10
> [ 196.617534] bus_for_each_dev+0x77/0xd0
> [ 196.621389] bus_add_driver+0x110/0x240
> [ 196.625238] driver_register+0x5b/0x110
> [ 196.629086] xe_init+0x3b/0x80 [xe]
> [ 196.632615] ? __pfx_xe_init+0x10/0x10 [xe]
> [ 196.636829] do_one_initcall+0x5e/0x2b0
> [ 196.640683] ? rcu_is_watching+0x11/0x50
> [ 196.644622] ? __kmalloc_cache_noprof+0x24e/0x2f0
> [ 196.649343] do_init_module+0x5f/0x210
> [ 196.653113] init_module_from_file+0x86/0xd0
> [ 196.657402] idempotent_init_module+0x17c/0x230
> [ 196.661946] __x64_sys_finit_module+0x59/0xb0
> [ 196.666323] do_syscall_64+0x68/0x140
> [ 196.670006] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Matt
>
> [1] https://patchwork.freedesktop.org/series/136701/
>
>
> Matthew Brost (1):
> drm/xe: Add driver load error injection
>
> drivers/gpu/drm/xe/xe_device.c | 31 ++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_device.h | 15 ++++++++++++++
> drivers/gpu/drm/xe/xe_device_types.h | 4 ++++
> drivers/gpu/drm/xe/xe_gt.c | 5 +++++
> drivers/gpu/drm/xe/xe_gt_sriov_pf.c | 4 ++++
> drivers/gpu/drm/xe/xe_guc.c | 8 +++++++
> drivers/gpu/drm/xe/xe_guc_ads.c | 5 +++++
> drivers/gpu/drm/xe/xe_guc_ct.c | 4 ++++
> drivers/gpu/drm/xe/xe_guc_log.c | 5 +++++
> drivers/gpu/drm/xe/xe_mmio.c | 5 +++++
> drivers/gpu/drm/xe/xe_module.c | 5 +++++
> drivers/gpu/drm/xe/xe_module.h | 3 +++
> drivers/gpu/drm/xe/xe_pci.c | 9 ++++++++
> drivers/gpu/drm/xe/xe_pm.c | 8 +++++++
> drivers/gpu/drm/xe/xe_sriov.c | 8 ++++++-
> drivers/gpu/drm/xe/xe_tile.c | 4 ++++
> drivers/gpu/drm/xe/xe_uc.c | 4 ++++
> drivers/gpu/drm/xe/xe_wa.c | 5 +++++
> drivers/gpu/drm/xe/xe_wopcm.c | 4 ++++
> 19 files changed, 135 insertions(+), 1 deletion(-)
--
Jani Nikula, Intel
More information about the Intel-xe
mailing list