Regression on linux-next (next-20240625)

Borah, Chaitanya Kumar chaitanya.kumar.borah at intel.com
Fri Jun 28 04:45:27 UTC 2024


[converted to plain text]
+intel-gfx

Gentle Reminder.

From: Borah, Chaitanya Kumar 
Sent: Wednesday, June 26, 2024 8:52 PM
To: sidhartha.kumar at oracle.com
Cc: Liam.Howlett at oracle.com; akpm at linux-foundation.org; linux-mm at kvack.org; maple-tree at lists.infradead.org; Nikula, Jani <jani.nikula at intel.com>; Saarinen, Jani <jani.saarinen at intel.com>; Kurmi, Suresh Kumar <Suresh.Kumar.Kurmi at intel.com>
Subject: Regression on linux-next (next-20240625)

Hello Sidhartha,

Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.

This mail is regarding a regression we are seeing in our CI runs[1] on linux-next repository.

Since the version next-20240625 [2], we are seeing the following regression

`````````````````````````````````````````````````````````````````````````````````
<3>[    2.336948] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:337
<3>[    2.336974] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 95, name: kdevtmpfs
<3>[    2.336989] preempt_count: 1, expected: 0
<3>[    2.336998] RCU nest depth: 0, expected: 0
<4>[    2.337006] 3 locks held by kdevtmpfs/95:
<4>[    2.337015]  #0: ffff888100d2c3f0 (sb_writers){.+.+}-{0:0}, at: filename_create+0x5d/0x160
<4>[    2.337041]  #1: ffff888100800840 (&type->i_mutex_dir_key/1){+.+.}-{3:3}, at: filename_create+0x9d/0x160
<4>[    2.337065]  #2: ffff888100800658 (&simple_offset_lock_class){+.+.}-{2:2}, at: mtree_alloc_cyclic+0x71/0xf0
<3>[    2.337089] Preemption disabled at:
<3>[    2.337091] [<0000000000000000>] 0x0
<4>[    2.337105] CPU: 13 UID: 0 PID: 95 Comm: kdevtmpfs Not tainted 6.10.0-rc5-next-20240625-next-20240625-g0fc4bfab2cd4+ #1
<4>[    2.337126] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023
<4>[    2.337141] Call Trace:
<4>[    2.337147]  <TASK>
<4>[    2.337152]  dump_stack_lvl+0xb0/0xd0
<4>[    2.337163]  __might_resched+0x194/0x2b0
<4>[    2.337175]  kmem_cache_alloc_noprof+0x20c/0x280
<4>[    2.337186]  ? mas_alloc_nodes+0x173/0x230
<4>[    2.337197]  mas_alloc_nodes+0x173/0x230
<4>[    2.337207]  mas_alloc_cyclic+0x27b/0x550
<4>[    2.337220]  mtree_alloc_cyclic+0x92/0xf0
`````````````````````````````````````````````````````````````````````````````````
Details log can be found in [3].

After bisecting the tree, the following patch [4] seems to be the first "bad"
commit

`````````````````````````````````````````````````````````````````````````````````````````````````````````
    maple_tree: remove mas_destroy() from mas_nomem()

    Separate call to mas_destroy() from mas_nomem() so we can check for no
    memory errors without destroying the current maple state in
    mas_store_gfp().  We then add calls to mas_destroy() to callers of
    mas_nomem().

    Link: https://lkml.kernel.org/r/20240618204750.79512-6-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar mailto:sidhartha.kumar at oracle.com

`````````````````````````````````````````````````````````````````````````````````````````````````````````

We could not revert the patch because of merge conflicts but resetting to the parent of the commit seems to fix the issue.

Could you please check why the patch causes this regression and provide a fix if necessary?

Thank you.

Regards

Chaitanya

[1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
[2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240625
[3] https://intel-gfx-ci.01.org/tree/linux-next/next-20240625/bat-rpls-4/boot0.txt 
[4] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=187827d2dc3749d66546696b78584ee4c54687b0


More information about the Intel-gfx mailing list