Regression on linux-next (next-20240722)

Borah, Chaitanya Kumar chaitanya.kumar.borah at intel.com
Tue Jul 23 19:08:52 UTC 2024


Hello Anna-Maria,

Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.

This mail is regarding a regression we are seeing in our CI runs[1] on linux-next repository.

Since the version next-20240722 [2], we are seeing the following regression

`````````````````````````````````````````````````````````````````````````````````
<6>[    0.787321] Timer migration: 2 hierarchy levels; 8 children per group; 2 crossnode level
<4>[    0.787330] ------------[ cut here ]------------
<4>[    0.787335] WARNING: CPU: 0 PID: 1 at kernel/time/timer_migration.c:1714 tmigr_cpu_prepare+0x5f2/0x680
<4>[    0.787340] Modules linked in:
<4>[    0.787341] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.10.0-next-20240722-next-20240722-gdee7f101b642+ #1
<4>[    0.787342] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023
<4>[    0.787343] RIP: 0010:tmigr_cpu_prepare+0x5f2/0x680
<4>[    0.787344] Code: fc ff ff 80 3d dc d5 6c 01 00 0f 85 56 fc ff ff 48 c7 c7 f8 ba 48 82 c6 05 c8 d5 6c 01 01 e8 95 1b f0 ff 0f 0b e9 3c fc ff ff <0f> 0b e9 41 fa ff ff 4c 89 e7 48 89 2c 24 e8 7b cd 11 00 48 c7 c7
<4>[    0.787345] RSP: 0000:ffffc90000067d18 EFLAGS: 00010246
<4>[    0.787346] RAX: 0000000000000000 RBX: ffff88885f0214e0 RCX: 0000000000000000
<4>[    0.787347] RDX: 0000000000000001 RSI: ffffffff8243cfef RDI: 0000000000000000
<4>[    0.787347] RBP: 000000000002e74c R08: 0000000000000000 R09: 0000000000000000
<4>[    0.787347] R10: ffffc90000067e08 R11: ffff888100ce8040 R12: 0000000000000000
<4>[    0.787348] R13: 0000000000000040 R14: ffffffff81198620 R15: ffffffff8264b880
<4>[    0.787348] FS:  0000000000000000(0000) GS:ffff88885f000000(0000) knlGS:0000000000000000
<4>[    0.787349] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[    0.787350] CR2: ffff88887f7ff000 CR3: 000000000663a000 CR4: 0000000000f50ef0
<4>[    0.787350] PKRU: 55555554
<4>[    0.787351] Call Trace:
<4>[    0.787351]  <TASK>
<4>[    0.787352]  ? __warn+0x91/0x1a0
<4>[    0.787354]  ? tmigr_cpu_prepare+0x5f2/0x680
<4>[    0.787355]  ? report_bug+0x1f8/0x200
<4>[    0.787359]  ? handle_bug+0x3c/0x70
<4>[    0.787361]  ? exc_invalid_op+0x18/0x70
<4>[    0.787362]  ? asm_exc_invalid_op+0x1a/0x20
<4>[    0.787364]  ? __pfx_tmigr_cpu_prepare+0x10/0x10
<4>[    0.787367]  ? tmigr_cpu_prepare+0x5f2/0x680
<4>[    0.787369]  ? __pfx_tmigr_cpu_prepare+0x10/0x10
<4>[    0.787370]  cpuhp_invoke_callback+0x17b/0x6b0
<4>[    0.787372]  cpuhp_issue_call+0x9a/0x1d0
<4>[    0.787374]  __cpuhp_setup_state_cpuslocked+0x1cc/0x2c0
<4>[    0.787376]  ? __pfx_tmigr_cpu_prepare+0x10/0x10
<4>[    0.787377]  __cpuhp_setup_state+0xb8/0x220
<4>[    0.787379]  ? __pfx_tmigr_init+0x10/0x10
<4>[    0.787381]  tmigr_init+0xd8/0x140
<4>[    0.787383]  do_one_initcall+0x5c/0x2b0
<4>[    0.787385]  ? call_rcu_tasks_generic.constprop.0+0x182/0x3c0
<4>[    0.787388]  kernel_init_freeable+0xae/0x340
<4>[    0.787390]  ? __pfx_kernel_init+0x10/0x10
<4>[    0.787392]  kernel_init+0x15/0x130
<4>[    0.787393]  ret_from_fork+0x2c/0x50
<4>[    0.787395]  ? __pfx_kernel_init+0x10/0x10
<4>[    0.787396]  ret_from_fork_asm+0x1a/0x30
<4>[    0.787399]  </TASK>
`````````````````````````````````````````````````````````````````````````````````
Details log can be found in [3].

After bisecting the tree, the following patch [4] seems to be the first "bad" commit

`````````````````````````````````````````````````````````````````````````````````````````````````````````
commit 7a5ee4aa61afa9f1570c80ffba92987bc73ce3ab
Author: Anna-Maria Behnsen mailto:anna-maria at linutronix.de
Date:   Wed Jul 17 11:49:40 2024 +0200

    timers/migration: Move hierarchy setup into cpuhotplug prepare callback
`````````````````````````````````````````````````````````````````````````````````````````````````````````

We could not revert the patch because of a merge conflicts but resetting to the parent of the commit seems to fix the issue

Could you please check why the patch causes this regression and provide a fix if necessary?

Thank you.

Regards

Chaitanya

[1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
[2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240722
[3] http://gfx-ci.igk.intel.com/tree/linux-next/next-20240722/bat-rpls-4/boot0.txt
[4] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240722&id=7a5ee4aa61afa9f1570c80ffba92987bc73ce3ab


More information about the Intel-gfx mailing list