BUG in drm_kms_helper_poll_enable() fixed by reverting "drm/ast: report connection status on Display Port."

Kim Phillips kim.phillips at amd.com
Thu Nov 9 00:37:03 UTC 2023


Hi, current linux kernel commit 90450a06162e
("Merge tag 'rcu-fixes-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks")
and the attached config cause the following BUG when booting on
a reference AMD Zen4 development server:

[   59.995717] input: OpenBMC virtual_input as /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.0/0003:1D6B:0104.0002/input/input4
[   60.033135] ast 0000:c2:00.0: vgaarb: deactivate vga console
[   60.066230] ast 0000:c2:00.0: [drm] Using default configuration
[   60.070342] hid-generic 0003:1D6B:0104.0002: input,hidraw0: USB HID v1.01 Keyboard [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input0
[   60.072843] ast 0000:c2:00.0: [drm] AST 2600 detected
[   60.072851] ast 0000:c2:00.0: [drm] Using ASPEED DisplayPort transmitter
[   60.099891] ast 0000:c2:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16
[   60.115780] [drm] Initialized ast 0.1.0 20120228 for 0000:c2:00.0 on minor 0
[   60.135643] fbcon: astdrmfb (fb0) is primary device
[   60.135649] fbcon: Deferring console take-over
[   60.146162] ast 0000:c2:00.0: [drm] fb0: astdrmfb frame buffer device
[   60.331802] input: OpenBMC virtual_input as /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.0/0003:1D6B:0104.0002/input/input5
[   60.405807] hid-generic 0003:1D6B:0104.0002: input,hidraw0: USB HID v1.01 Keyboard [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input0
[   60.423774] input: OpenBMC virtual_input as /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.1/0003:1D6B:0104.0004/input/input6
[   60.443170] hid-generic 0003:1D6B:0104.0004: input,hidraw1: USB HID v1.01 Mouse [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input1
[   60.460675] ast 0000:c2:00.0: vgaarb: deactivate vga console
[   60.479996] ast 0000:c2:00.0: [drm] Using default configuration
[   60.486603] ast 0000:c2:00.0: [drm] AST 2600 detected
[   60.492249] ast 0000:c2:00.0: [drm] Using ASPEED DisplayPort transmitter
[   60.499732] ast 0000:c2:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16
[   60.508955] BUG: unable to handle page fault for address: ffff8881e98109f0
[   60.516623] #PF: supervisor write access in kernel mode
[   60.522449] #PF: error_code(0x0002) - not-present page
[   60.528168] PGD 8dbc01067 P4D 8dbc01067 PUD 104c984067 PMD 104c837067 PTE 800ffffe167ef060
[   60.537394] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
[   60.543805] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G        W          6.6.0+ #3
[   60.552251] Hardware name: AMD Corporation ONYX/ONYX, BIOS ROX100AB 09/14/2023
[   60.560309] Workqueue: events work_for_cpu_fn
[   60.565173] RIP: 0010:enqueue_timer (/home/amd/git/linux/./include/linux/list.h:1034 /home/amd/git/linux/kernel/time/timer.c:605)
[ 60.570129] Code: 44 00 00 55 48 89 e5 41 55 49 89 cd 41 54 49 89 fc 53 48 89 f3 89 d6 48 8d 84 f7 b0 00 00 00 48 8b 08 48 89 0b 48 85 c9 74 04 <48> 89 59 08 48 89 18 48 89 43 08 49 8d 44 24 68 48 0f ab 30 8b 4b
All code
========
    0:   44 00 00                add    %r8b,(%rax)
    3:   55                      push   %rbp
    4:   48 89 e5                mov    %rsp,%rbp
    7:   41 55                   push   %r13
    9:   49 89 cd                mov    %rcx,%r13
    c:   41 54                   push   %r12
    e:   49 89 fc                mov    %rdi,%r12
   11:   53                      push   %rbx
   12:   48 89 f3                mov    %rsi,%rbx
   15:   89 d6                   mov    %edx,%esi
   17:   48 8d 84 f7 b0 00 00    lea    0xb0(%rdi,%rsi,8),%rax
   1e:   00
   1f:   48 8b 08                mov    (%rax),%rcx
   22:   48 89 0b                mov    %rcx,(%rbx)
   25:   48 85 c9                test   %rcx,%rcx
   28:   74 04                   je     0x2e
   2a:*  48 89 59 08             mov    %rbx,0x8(%rcx)           <-- trapping instruction
   2e:   48 8
   31:   48 89 43 08             mov    %rax,0x8(%rbx)
   35:   49 8d 44 24 68          lea    0x68(%r12),%rax
   3a:   48 0f ab 30             bts    %rsi,(%rax)
   3e:   8b                      .byte 0x8b
   3f:   4b                      rex.WXB

Code starting with the faulting instruction
===========================================
    0:   48 89 59 08             mov    %rbx,0x8(%rcx)
    4:   48 89 18                mov    %rbx,(%rax)
    7:   48 89 43 08             mov    %rax,0x8(%rbx)
    b:   49 8d 44 24 68          lea    0x68(%r12),%rax
   10:   48 0f ab 30             bts    %rsi,(%rax)
   14:   8b                      .byte 0x8b
   15:   4b                      rex.WXB
[   60.591081] RSP: 0018:ffffc900000dbbe0 EFLAGS: 00010086
[   60.596908] RAX: ffff888fd59e31b8 RBX: ffff8881ec87c9e8 RCX: ffff8881e98109e8
[   60.604866] RDX: 0000000000000099 RSI: 0000000000000099 RDI: ffff888fd59e2c40
[   60.612826] RBP: ffffc900000dbbf8 R08: 0000000000000001 R09: ffff888fd59e2c40
[   60.620787] R10: 000000000000550d R11: 0000000000000000 R12: ffff888fd59e2c40
[   60.628748] R13: 00000000ffff1640 R14: 00000000ffff163c R15: 0000000000000000
[   60.636706] FS:  0000000000000000(0000) GS:ffff888fd5800000(0000) knlGS:0000000000000000
[   60.645732] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   60.652141] CR2: ffff8881e98109f0 CR3: 00000008d5e3c003 CR4: 0000000000770ef0
[   60.660101] PKRU: 55555554
[   60.663114] Call Trace:
[   60.665838]  <TASK>
[   60.668174] ? show_regs (/home/amd/git/linux/arch/x86/kernel/dumpstack.c:479)
[   60.671971] ? __die (/home/amd/git/linux/arch/x86/kernel/dumpstack.c:421 /home/amd/git/linux/arch/x86/kernel/dumpstack.c:434)
[   60.675375] ? page_fault_oops (/home/amd/git/linux/arch/x86/mm/fault.c:707)
[   60.679942] ? search_bpf_extables (/home/amd/git/linux/kernel/bpf/core.c:765)
[   60.684800] ? enqueue_timer (/home/amd/git/linux/./include/linux/list.h:1034 /home/amd/git/linux/kernel/time/timer.c:605)
[   60.689077] ? srso_alias_return_thunk (/home/amd/git/linux/arch/x86/lib/retpoline.S:181)
[   60.694422] ? search_exception_tables (/home/amd/git/linux/kernel/extable.c:64)
[   60.699571] ? srso_alias_return_thunk (/home/amd/git/linux/arch/x86/lib/retpoline.S:181)
[   60.704917] ? kernelmode_fixup_or_oops (/home/amd/git/linux/arch/x86/mm/fault.c:762)
[   60.710256] ? __bad_area_nosemaphore (/home/amd/git/linux/arch/x86/mm/fault.c:860)
[   60.715505] ? bad_area_nosemaphore (/home/amd/git/linux/arch/x86/mm/fault.c:867)
[   60.720364] ? do_kern_addr_fault (/home/amd/git/linux/arch/x86/mm/fault.c:1227)
[   60.725030] ? exc_page_fault (/home/amd/git/linux/arch/x86/mm/fault.c:1503 /home/amd/git/linux/arch/x86/mm/fault.c:1561)
[   60.729503] ? asm_exc_page_fault (/home/amd/git/linux/./arch/x86/include/asm/idtentry.h:570)
[   60.734174] ? enqueue_timer (/home/amd/git/linux/./include/linux/list.h:1034 /home/amd/git/linux/kernel/time/timer.c:605)
[   60.738453] __mod_timer (/home/amd/git/linux/kernel/time/timer.c:635 /home/amd/git/linux/kernel/time/timer.c:1131)
[   60.742439] ? local_clock_noinstr (/home/amd/git/linux/kernel/sched/clock.c:301)
[   60.747202] add_timer (/home/amd/git/linux/kernel/time/timer.c:1245)
[   60.750798] __queue_delayed_work (/home/amd/git/linux/kernel/workqueue.c:1962)
[   60.755463] queue_delayed_work_on (/home/amd/git/linux/kernel/workqueue.c:1987)
[   60.760226] drm_kms_helper_poll_enable (/home/amd/git/linux/drivers/gpu/drm/drm_probe_helper.c:310) drm_kms_helper
[   60.767229] drm_kms_helper_poll_init (/home/amd/git/linux/drivers/gpu/drm/drm_probe_helper.c:914) drm_kms_helper
[   60.773936] ast_mode_config_init (/home/amd/git/linux/drivers/gpu/drm/ast/ast_mode.c:1931) ast
[   60.779382] ast_device_create (/home/amd/git/linux/drivers/gpu/drm/ast/ast_main.c:518) ast
[   60.784533] ast_pci_probe (/home/amd/git/linux/drivers/gpu/drm/ast/ast_drv.c:106) ast
[   60.789107] local_pci_probe (/home/amd/git/linux/drivers/pci/pci-driver.c:324)
[   60.793292] work_for_cpu_fn (/home/amd/git/linux/kernel/workqueue.c:5621)
[   60.797471] process_one_work (/home/amd/git/linux/kernel/workqueue.c:2630)
[   60.801941] ? process_one_work (/home/amd/git/linux/kernel/workqueue.c:2605)
[   60.806608] worker_thread (/home/amd/git/linux/kernel/workqueue.c:2697 /home/amd/git/linux/kernel/workqueue.c:2784)
[   60.810790] ? __pfx_worker_thread (/home/amd/git/linux/kernel/workqueue.c:2730)
[   60.815554] kthread (/home/amd/git/linux/kernel/kthread.c:388)
[   60.819151] ? __pfx_kthread (/home/amd/git/linux/kernel/kthread.c:341)
[   60.823331] ret_from_fork (/home/amd/git/linux/arch/x86/kernel/process.c:147)
[   60.827318] ? __pfx_kthread (/home/amd/git/linux/kernel/kthread.c:341)
[   60.831498] ret_from_fork_asm (/home/amd/git/linux/arch/x86/entry/entry_64.S:250)
[   60.835878]  </TASK>
[   60.838309] Modules linked in: crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 ast(+) i2c_algo_bit drm_shmem_helper hid_generic(+) drm_kms_helper uas dax_hmem nvme usbhid usb_storage drm hid ahci(+) libahci i2c_piix4 nvme_core wmi aesni_intel crypto_simd cryptd
[   60.867920] CR2: ffff8881e98109f0
[   60.871616] ---[ end trace 0000000000000000 ]---

drivers/gpu/drm/drm_probe_helper.c:310 is the
dev->mode_config.poll_running assignment here:

void drm_kms_helper_poll_enable(struct drm_device *dev)
{
	if (!dev->mode_config.poll_enabled || !drm_kms_helper_poll ||
	    dev->mode_config.poll_running)
		return;

	if (drm_kms_helper_enable_hpd(dev) ||
	    dev->mode_config.delayed_event)
		reschedule_output_poll_work(dev);

	dev->mode_config.poll_running = true;           <<<<< HERE
}
EXPORT_SYMBOL(drm_kms_helper_poll_enable);

If I revert commit f81bb0ac7872893241319ea82504956676ef02fd
("drm/ast: report connection status on Display Port."), the splat
goes away:

[   60.603837] input: OpenBMC virtual_input as /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.0/0003:1D6B:0104.0002/input/input4
[   60.651733] ast 0000:c2:00.0: vgaarb: deactivate vga console
[   60.659978]  4k 16711104 large 0 gb 0 x 1303[ffff888000097000-ffff8880a7ffe000] miss 383488
[   60.669321] ok.
[   60.670497] ast 0000:c2:00.0: [drm] Using default configuration
[   60.677894] ast 0000:c2:00.0: [drm] AST 2600 detected
[   60.683545] ast 0000:c2:00.0: [drm] Using ASPEED DisplayPort transmitter
[   60.685381] hid-generic 0003:1D6B:0104.0002: input,hidraw0: USB HID v1.01 Keyboard [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input0
[   60.691032] ast 0000:c2:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16
[   60.697172] [drm] Initialized ast 0.1.0 20120228 for 0000:c2:00.0 on minor 0
[   60.729565] fbcon: astdrmfb (fb0) is primary device
[   60.729570] fbcon: Deferring console take-over
[   60.741322] ast 0000:c2:00.0: [drm] fb0: astdrmfb frame buffer device
[   60.928226] ast 0000:c2:00.0: vgaarb: deactivate vga console
[   60.940376] input: OpenBMC virtual_input as /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.0/0003:1D6B:0104.0002/input/input5
[   60.965436] ast 0000:c2:00.0: [drm] Using default configuration
[   60.972051] ast 0000:c2:00.0: [drm] AST 2600 detected
[   60.977698] ast 0000:c2:00.0: [drm] Using ASPEED DisplayPort transmitter
[   60.985181] ast 0000:c2:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16
[   61.000056] [drm] Initialized ast 0.1.0 20120228 for 0000:c2:00.0 on minor 0
[   61.013486] fbcon: Deferring console take-over
[   61.016918] hid-generic 0003:1D6B:0104.0002: input,hidraw0: USB HID v1.01 Keyboard [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input0
[   61.018454] ast 0000:c2:00.0: [drm] fb0: astdrmfb frame buffer device
[   61.040853] input: OpenBMC virtual_input as /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.1/0003:1D6B:0104.0004/input/input6
[   61.059112] hid-generic 0003:1D6B:0104.0004: input,hidraw1: USB HID v1.01 Mouse [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input1
[   61.358397] input: OpenBMC virtual_input as /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.1/0003:1D6B:0104.0004/input/input7
[   61.376885] hid-generic 0003:1D6B:0104.0004: input,hidraw1: USB HID v1.01 Mouse [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input1

This has happened before when drm_kms_helper_poll_init() was added
to an ast connector_init(), see:

commit 595cb5e0b832a3e100cbbdefef797b0c27bf725a
Author: Kim Phillips <kim.phillips at amd.com>
Date:   Thu Oct 21 10:30:06 2021 -0500

     Revert "drm/ast: Add detect function support"

I'm willing to test any proposed changes, esp. if it means
not reverting this commit, too, because that will only likely
lead to yet another BUG instance if/when another poll_init() gets
added in the future.  Should the FIXME described in
reschedule_output_poll_work() be addressed?

Thanks,

Kim
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ast-splat.config.gz
Type: application/gzip
Size: 67767 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20231108/d04ca9ac/attachment-0001.gz>


More information about the dri-devel mailing list