BUG in drm_kms_helper_poll_enable() fixed by reverting "drm/ast: report connection status on Display Port."

Jocelyn Falempe jfalempe at redhat.com
Thu Nov 9 13:47:47 UTC 2023


On 09/11/2023 01:37, Kim Phillips wrote:
> Hi, current linux kernel commit 90450a06162e
> ("Merge tag 'rcu-fixes-v6.7' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks")
> and the attached config cause the following BUG when booting on
> a reference AMD Zen4 development server:
> 
> [   59.995717] input: OpenBMC virtual_input as 
> /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.0/0003:1D6B:0104.0002/input/input4
> [   60.033135] ast 0000:c2:00.0: vgaarb: deactivate vga console
> [   60.066230] ast 0000:c2:00.0: [drm] Using default configuration
> [   60.070342] hid-generic 0003:1D6B:0104.0002: input,hidraw0: USB HID 
> v1.01 Keyboard [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input0
> [   60.072843] ast 0000:c2:00.0: [drm] AST 2600 detected
> [   60.072851] ast 0000:c2:00.0: [drm] Using ASPEED DisplayPort transmitter
> [   60.099891] ast 0000:c2:00.0: [drm] dram MCLK=396 Mhz type=1 
> bus_width=16
> [   60.115780] [drm] Initialized ast 0.1.0 20120228 for 0000:c2:00.0 on 
> minor 0
> [   60.135643] fbcon: astdrmfb (fb0) is primary device
> [   60.135649] fbcon: Deferring console take-over
> [   60.146162] ast 0000:c2:00.0: [drm] fb0: astdrmfb frame buffer device
> [   60.331802] input: OpenBMC virtual_input as 
> /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.0/0003:1D6B:0104.0002/input/input5
> [   60.405807] hid-generic 0003:1D6B:0104.0002: input,hidraw0: USB HID 
> v1.01 Keyboard [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input0
> [   60.423774] input: OpenBMC virtual_input as 
> /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.1/0003:1D6B:0104.0004/input/input6
> [   60.443170] hid-generic 0003:1D6B:0104.0004: input,hidraw1: USB HID 
> v1.01 Mouse [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input1
> [   60.460675] ast 0000:c2:00.0: vgaarb: deactivate vga console
> [   60.479996] ast 0000:c2:00.0: [drm] Using default configuration
> [   60.486603] ast 0000:c2:00.0: [drm] AST 2600 detected
> [   60.492249] ast 0000:c2:00.0: [drm] Using ASPEED DisplayPort transmitter
> [   60.499732] ast 0000:c2:00.0: [drm] dram MCLK=396 Mhz type=1 
> bus_width=16
> [   60.508955] BUG: unable to handle page fault for address: 
> ffff8881e98109f0
> [   60.516623] #PF: supervisor write access in kernel mode
> [   60.522449] #PF: error_code(0x0002) - not-present page
> [   60.528168] PGD 8dbc01067 P4D 8dbc01067 PUD 104c984067 PMD 104c837067 
> PTE 800ffffe167ef060
> [   60.537394] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
> [   60.543805] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G        
> W          6.6.0+ #3
> [   60.552251] Hardware name: AMD Corporation ONYX/ONYX, BIOS ROX100AB 
> 09/14/2023
> [   60.560309] Workqueue: events work_for_cpu_fn
> [   60.565173] RIP: 0010:enqueue_timer 
> (/home/amd/git/linux/./include/linux/list.h:1034 
> /home/amd/git/linux/kernel/time/timer.c:605)
> [ 60.570129] Code: 44 00 00 55 48 89 e5 41 55 49 89 cd 41 54 49 89 fc 53 
> 48 89 f3 89 d6 48 8d 84 f7 b0 00 00 00 48 8b 08 48 89 0b 48 85 c9 74 04 
> <48> 89 59 08 48 89 18 48 89 43 08 49 8d 44 24 68 48 0f ab 30 8b 4b
> All code
> ========
>     0:   44 00 00                add    %r8b,(%rax)
>     3:   55                      push   %rbp
>     4:   48 89 e5                mov    %rsp,%rbp
>     7:   41 55                   push   %r13
>     9:   49 89 cd                mov    %rcx,%r13
>     c:   41 54                   push   %r12
>     e:   49 89 fc                mov    %rdi,%r12
>    11:   53                      push   %rbx
>    12:   48 89 f3                mov    %rsi,%rbx
>    15:   89 d6                   mov    %edx,%esi
>    17:   48 8d 84 f7 b0 00 00    lea    0xb0(%rdi,%rsi,8),%rax
>    1e:   00
>    1f:   48 8b 08                mov    (%rax),%rcx
>    22:   48 89 0b                mov    %rcx,(%rbx)
>    25:   48 85 c9                test   %rcx,%rcx
>    28:   74 04                   je     0x2e
>    2a:*  48 89 59 08             mov    %rbx,0x8(%rcx)           <-- 
> trapping instruction
>    2e:   48 8
>    31:   48 89 43 08             mov    %rax,0x8(%rbx)
>    35:   49 8d 44 24 68          lea    0x68(%r12),%rax
>    3a:   48 0f ab 30             bts    %rsi,(%rax)
>    3e:   8b                      .byte 0x8b
>    3f:   4b                      rex.WXB
> 
> Code starting with the faulting instruction
> ===========================================
>     0:   48 89 59 08             mov    %rbx,0x8(%rcx)
>     4:   48 89 18                mov    %rbx,(%rax)
>     7:   48 89 43 08             mov    %rax,0x8(%rbx)
>     b:   49 8d 44 24 68          lea    0x68(%r12),%rax
>    10:   48 0f ab 30             bts    %rsi,(%rax)
>    14:   8b                      .byte 0x8b
>    15:   4b                      rex.WXB
> [   60.591081] RSP: 0018:ffffc900000dbbe0 EFLAGS: 00010086
> [   60.596908] RAX: ffff888fd59e31b8 RBX: ffff8881ec87c9e8 RCX: 
> ffff8881e98109e8
> [   60.604866] RDX: 0000000000000099 RSI: 0000000000000099 RDI: 
> ffff888fd59e2c40
> [   60.612826] RBP: ffffc900000dbbf8 R08: 0000000000000001 R09: 
> ffff888fd59e2c40
> [   60.620787] R10: 000000000000550d R11: 0000000000000000 R12: 
> ffff888fd59e2c40
> [   60.628748] R13: 00000000ffff1640 R14: 00000000ffff163c R15: 
> 0000000000000000
> [   60.636706] FS:  0000000000000000(0000) GS:ffff888fd5800000(0000) 
> knlGS:0000000000000000
> [   60.645732] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   60.652141] CR2: ffff8881e98109f0 CR3: 00000008d5e3c003 CR4: 
> 0000000000770ef0
> [   60.660101] PKRU: 55555554
> [   60.663114] Call Trace:
> [   60.665838]  <TASK>
> [   60.668174] ? show_regs 
> (/home/amd/git/linux/arch/x86/kernel/dumpstack.c:479)
> [   60.671971] ? __die 
> (/home/amd/git/linux/arch/x86/kernel/dumpstack.c:421 
> /home/amd/git/linux/arch/x86/kernel/dumpstack.c:434)
> [   60.675375] ? page_fault_oops 
> (/home/amd/git/linux/arch/x86/mm/fault.c:707)
> [   60.679942] ? search_bpf_extables 
> (/home/amd/git/linux/kernel/bpf/core.c:765)
> [   60.684800] ? enqueue_timer 
> (/home/amd/git/linux/./include/linux/list.h:1034 
> /home/amd/git/linux/kernel/time/timer.c:605)
> [   60.689077] ? srso_alias_return_thunk 
> (/home/amd/git/linux/arch/x86/lib/retpoline.S:181)
> [   60.694422] ? search_exception_tables 
> (/home/amd/git/linux/kernel/extable.c:64)
> [   60.699571] ? srso_alias_return_thunk 
> (/home/amd/git/linux/arch/x86/lib/retpoline.S:181)
> [   60.704917] ? kernelmode_fixup_or_oops 
> (/home/amd/git/linux/arch/x86/mm/fault.c:762)
> [   60.710256] ? __bad_area_nosemaphore 
> (/home/amd/git/linux/arch/x86/mm/fault.c:860)
> [   60.715505] ? bad_area_nosemaphore 
> (/home/amd/git/linux/arch/x86/mm/fault.c:867)
> [   60.720364] ? do_kern_addr_fault 
> (/home/amd/git/linux/arch/x86/mm/fault.c:1227)
> [   60.725030] ? exc_page_fault 
> (/home/amd/git/linux/arch/x86/mm/fault.c:1503 
> /home/amd/git/linux/arch/x86/mm/fault.c:1561)
> [   60.729503] ? asm_exc_page_fault 
> (/home/amd/git/linux/./arch/x86/include/asm/idtentry.h:570)
> [   60.734174] ? enqueue_timer 
> (/home/amd/git/linux/./include/linux/list.h:1034 
> /home/amd/git/linux/kernel/time/timer.c:605)
> [   60.738453] __mod_timer (/home/amd/git/linux/kernel/time/timer.c:635 
> /home/amd/git/linux/kernel/time/timer.c:1131)
> [   60.742439] ? local_clock_noinstr 
> (/home/amd/git/linux/kernel/sched/clock.c:301)
> [   60.747202] add_timer (/home/amd/git/linux/kernel/time/timer.c:1245)
> [   60.750798] __queue_delayed_work 
> (/home/amd/git/linux/kernel/workqueue.c:1962)
> [   60.755463] queue_delayed_work_on 
> (/home/amd/git/linux/kernel/workqueue.c:1987)
> [   60.760226] drm_kms_helper_poll_enable 
> (/home/amd/git/linux/drivers/gpu/drm/drm_probe_helper.c:310) drm_kms_helper
> [   60.767229] drm_kms_helper_poll_init 
> (/home/amd/git/linux/drivers/gpu/drm/drm_probe_helper.c:914) drm_kms_helper
> [   60.773936] ast_mode_config_init 
> (/home/amd/git/linux/drivers/gpu/drm/ast/ast_mode.c:1931) ast
> [   60.779382] ast_device_create 
> (/home/amd/git/linux/drivers/gpu/drm/ast/ast_main.c:518) ast
> [   60.784533] ast_pci_probe 
> (/home/amd/git/linux/drivers/gpu/drm/ast/ast_drv.c:106) ast
> [   60.789107] local_pci_probe 
> (/home/amd/git/linux/drivers/pci/pci-driver.c:324)
> [   60.793292] work_for_cpu_fn 
> (/home/amd/git/linux/kernel/workqueue.c:5621)
> [   60.797471] process_one_work 
> (/home/amd/git/linux/kernel/workqueue.c:2630)
> [   60.801941] ? process_one_work 
> (/home/amd/git/linux/kernel/workqueue.c:2605)
> [   60.806608] worker_thread 
> (/home/amd/git/linux/kernel/workqueue.c:2697 
> /home/amd/git/linux/kernel/workqueue.c:2784)
> [   60.810790] ? __pfx_worker_thread 
> (/home/amd/git/linux/kernel/workqueue.c:2730)
> [   60.815554] kthread (/home/amd/git/linux/kernel/kthread.c:388)
> [   60.819151] ? __pfx_kthread (/home/amd/git/linux/kernel/kthread.c:341)
> [   60.823331] ret_from_fork 
> (/home/amd/git/linux/arch/x86/kernel/process.c:147)
> [   60.827318] ? __pfx_kthread (/home/amd/git/linux/kernel/kthread.c:341)
> [   60.831498] ret_from_fork_asm 
> (/home/amd/git/linux/arch/x86/entry/entry_64.S:250)
> [   60.835878]  </TASK>
> [   60.838309] Modules linked in: crct10dif_pclmul crc32_pclmul 
> ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 ast(+) 
> i2c_algo_bit drm_shmem_helper hid_generic(+) drm_kms_helper uas dax_hmem 
> nvme usbhid usb_storage drm hid ahci(+) libahci i2c_piix4 nvme_core wmi 
> aesni_intel crypto_simd cryptd
> [   60.867920] CR2: ffff8881e98109f0
> [   60.871616] ---[ end trace 0000000000000000 ]---
> 
> drivers/gpu/drm/drm_probe_helper.c:310 is the
> dev->mode_config.poll_running assignment here:
> 
> void drm_kms_helper_poll_enable(struct drm_device *dev)
> {
>      if (!dev->mode_config.poll_enabled || !drm_kms_helper_poll ||
>          dev->mode_config.poll_running)
>          return;
> 
>      if (drm_kms_helper_enable_hpd(dev) ||
>          dev->mode_config.delayed_event)
>          reschedule_output_poll_work(dev);
> 
>      dev->mode_config.poll_running = true;           <<<<< HERE
> }
> EXPORT_SYMBOL(drm_kms_helper_poll_enable);
> 

Hi,

Thanks for the detailed bug report.

 From the call stack, I think the crash is more likely here:
https://elixir.bootlin.com/linux/v6.6/source/kernel/time/timer.c#L605

But the timer looks correctly initialized in
https://elixir.bootlin.com/linux/v6.6/source/drivers/gpu/drm/drm_probe_helper.c#L908

So I'm not sure why it fails in this case.


> If I revert commit f81bb0ac7872893241319ea82504956676ef02fd
> ("drm/ast: report connection status on Display Port."), the splat
> goes away:
> 
> [   60.603837] input: OpenBMC virtual_input as 
> /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.0/0003:1D6B:0104.0002/input/input4
> [   60.651733] ast 0000:c2:00.0: vgaarb: deactivate vga console
> [   60.659978]  4k 16711104 large 0 gb 0 x 
> 1303[ffff888000097000-ffff8880a7ffe000] miss 383488
> [   60.669321] ok.
> [   60.670497] ast 0000:c2:00.0: [drm] Using default configuration
> [   60.677894] ast 0000:c2:00.0: [drm] AST 2600 detected
> [   60.683545] ast 0000:c2:00.0: [drm] Using ASPEED DisplayPort transmitter
> [   60.685381] hid-generic 0003:1D6B:0104.0002: input,hidraw0: USB HID 
> v1.01 Keyboard [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input0
> [   60.691032] ast 0000:c2:00.0: [drm] dram MCLK=396 Mhz type=1 
> bus_width=16
> [   60.697172] [drm] Initialized ast 0.1.0 20120228 for 0000:c2:00.0 on 
> minor 0
> [   60.729565] fbcon: astdrmfb (fb0) is primary device
> [   60.729570] fbcon: Deferring console take-over
> [   60.741322] ast 0000:c2:00.0: [drm] fb0: astdrmfb frame buffer device
> [   60.928226] ast 0000:c2:00.0: vgaarb: deactivate vga console
> [   60.940376] input: OpenBMC virtual_input as 
> /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.0/0003:1D6B:0104.0002/input/input5
> [   60.965436] ast 0000:c2:00.0: [drm] Using default configuration
> [   60.972051] ast 0000:c2:00.0: [drm] AST 2600 detected
> [   60.977698] ast 0000:c2:00.0: [drm] Using ASPEED DisplayPort transmitter
> [   60.985181] ast 0000:c2:00.0: [drm] dram MCLK=396 Mhz type=1 
> bus_width=16
> [   61.000056] [drm] Initialized ast 0.1.0 20120228 for 0000:c2:00.0 on 
> minor 0
> [   61.013486] fbcon: Deferring console take-over
> [   61.016918] hid-generic 0003:1D6B:0104.0002: input,hidraw0: USB HID 
> v1.01 Keyboard [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input0
> [   61.018454] ast 0000:c2:00.0: [drm] fb0: astdrmfb frame buffer device
> [   61.040853] input: OpenBMC virtual_input as 
> /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.1/0003:1D6B:0104.0004/input/input6
> [   61.059112] hid-generic 0003:1D6B:0104.0004: input,hidraw1: USB HID 
> v1.01 Mouse [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input1
> [   61.358397] input: OpenBMC virtual_input as 
> /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.1/0003:1D6B:0104.0004/input/input7
> [   61.376885] hid-generic 0003:1D6B:0104.0004: input,hidraw1: USB HID 
> v1.01 Mouse [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input1
> 
> This has happened before when drm_kms_helper_poll_init() was added
> to an ast connector_init(), see:

The crash was in the detect() callback for that previous case.
This time it crashes when setting the timer, but they still look very 
similar, thanks for pointing this.
> 
> commit 595cb5e0b832a3e100cbbdefef797b0c27bf725a
> Author: Kim Phillips <kim.phillips at amd.com>
> Date:   Thu Oct 21 10:30:06 2021 -0500
> 
>      Revert "drm/ast: Add detect function support"
> 
> I'm willing to test any proposed changes, esp. if it means
> not reverting this commit, too, because that will only likely
> lead to yet another BUG instance if/when another poll_init() gets
> added in the future.  Should the FIXME described in
> reschedule_output_poll_work() be addressed?


This fixme just change the timer interval from 10s to 1s, so it 
shouldn't explain this crash.

Can you test with the attached patch ? I want to see if the detect 
callback is called, and also make sure the delayed_work struct is 
properly initialized.

> 
> Thanks,
> 
> Kim

Best regards,

-- 

Jocelyn
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-drm-probe-helper-Add-debug-for-AST-poll-bug.patch
Type: text/x-patch
Size: 1339 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20231109/edeb8bd9/attachment-0001.bin>


More information about the dri-devel mailing list