[Intel-gfx] Regression on linux-next (next-20231107)
Borah, Chaitanya Kumar
chaitanya.kumar.borah at intel.com
Thu Nov 9 17:00:09 UTC 2023
Hello Krister,
Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
This mail is regarding a regression we are seeing in our CI runs[1] for some machines (dg2 and adl-p) on linux-next repository.
Since the version next-20231107 [2], we are seeing the following error
```````````````````````````````````````````````````````````````````````````````
<4>[ 32.015910] stack segment: 0000 [#1] PREEMPT SMP NOPTI
<4>[ 32.021048] CPU: 15 PID: 766 Comm: fusermount Not tainted 6.6.0-next-20231107-next-20231107-g5cd631a52568+ #1
<4>[ 32.031135] Hardware name: Intel Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.4221.A00.2305271351 05/27/2023
<4>[ 32.044657] RIP: 0010:fuse_evict_inode+0x61/0x150 [fuse]
`````````````````````````````````````````````````````````````````````````````````
Details log can be found in [3].
After bisecting the tree, the following patch [4] seems to be the first "bad" commit
`````````````````````````````````````````````````````````````````````````````````````````````````````````
513dfacefd712bcbfab64e1a9c9c3e0d51c2dca5 is the first bad commit
commit 513dfacefd712bcbfab64e1a9c9c3e0d51c2dca5
Author: Krister Johansen kjlx at templeofstupid.com
Date: Fri Nov 3 10:39:47 2023 -0700
fuse: share lookup state between submount and its parent
Fuse submounts do not perform a lookup for the nodeid that they inherit
from their parent. Instead, the code decrements the nlookup on the
submount's fuse_inode when it is instantiated, and no forget is
performed when a submount root is evicted.
Trouble arises when the submount's parent is evicted despite the
submount itself being in use. In this author's case, the submount was
in a container and deatched from the initial mount namespace via a
MNT_DEATCH operation. When memory pressure triggered the shrinker, the
inode from the parent was evicted, which triggered enough forgets to
render the submount's nodeid invalid.
Since submounts should still function, even if their parent goes away,
solve this problem by sharing refcounted state between the parent and
its submount. When all of the references on this shared state reach
zero, it's safe to forget the final lookup of the fuse nodeid.
`````````````````````````````````````````````````````````````````````````````````````````````````````````
We also verified that if we revert the patch the issue is not seen.
Could you please check why the patch causes this regression and provide a fix if necessary?
Thank you.
Regards
Chaitanya
[1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
[2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20231107
[3] http://gfx-ci.igk.intel.com/tree/linux-next/next-20231109/bat-dg2-14/boot0.txt
[4] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20231107&id=513dfacefd712bcbfab64e1a9c9c3e0d51c2dca5
More information about the Intel-gfx
mailing list