[Intel-gfx] Regression on linux-next (next-20231107)
Krister Johansen
kjlx at templeofstupid.com
Thu Nov 9 20:40:22 UTC 2023
Hi Chaitanya,
On Thu, Nov 09, 2023 at 05:00:09PM +0000, Borah, Chaitanya Kumar wrote:
> Hello Krister,
>
> Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
>
> This mail is regarding a regression we are seeing in our CI runs[1] for some machines (dg2 and adl-p) on linux-next repository.
>
> Since the version next-20231107 [2], we are seeing the following error
> ```````````````````````````````````````````````````````````````````````````````
> <4>[ 32.015910] stack segment: 0000 [#1] PREEMPT SMP NOPTI
> <4>[ 32.021048] CPU: 15 PID: 766 Comm: fusermount Not tainted 6.6.0-next-20231107-next-20231107-g5cd631a52568+ #1
> <4>[ 32.031135] Hardware name: Intel Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.4221.A00.2305271351 05/27/2023
> <4>[ 32.044657] RIP: 0010:fuse_evict_inode+0x61/0x150 [fuse]
> `````````````````````````````````````````````````````````````````````````````````
>
> Details log can be found in [3].
>
> After bisecting the tree, the following patch [4] seems to be the first "bad" commit
>
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
> 513dfacefd712bcbfab64e1a9c9c3e0d51c2dca5 is the first bad commit
> commit 513dfacefd712bcbfab64e1a9c9c3e0d51c2dca5
> Author: Krister Johansen kjlx at templeofstupid.com
> Date: Fri Nov 3 10:39:47 2023 -0700
>
> fuse: share lookup state between submount and its parent
>
> Fuse submounts do not perform a lookup for the nodeid that they inherit
> from their parent. Instead, the code decrements the nlookup on the
> submount's fuse_inode when it is instantiated, and no forget is
> performed when a submount root is evicted.
>
> Trouble arises when the submount's parent is evicted despite the
> submount itself being in use. In this author's case, the submount was
> in a container and deatched from the initial mount namespace via a
> MNT_DEATCH operation. When memory pressure triggered the shrinker, the
> inode from the parent was evicted, which triggered enough forgets to
> render the submount's nodeid invalid.
>
> Since submounts should still function, even if their parent goes away,
> solve this problem by sharing refcounted state between the parent and
> its submount. When all of the references on this shared state reach
> zero, it's safe to forget the final lookup of the fuse nodeid.
>
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>
> We also verified that if we revert the patch the issue is not seen.
>
> Could you please check why the patch causes this regression and provide a fix if necessary?
Apologies for the inconvenience. I've reproduced the problem, tested a
fix, and am in the process of preparing patches to send to Miklos. I'll
cc the people on this e-mail in that thread.
> [3] http://gfx-ci.igk.intel.com/tree/linux-next/next-20231109/bat-dg2-14/boot0.txt
This link didn't resolve in DNS when I tried to access it. I needed to
use intel-gfx-ci.01.org as the hostname instead.
Thanks,
-K
More information about the Intel-gfx
mailing list