[Intel-gfx] Regression on linux-next (next-20231107)
Borah, Chaitanya Kumar
chaitanya.kumar.borah at intel.com
Mon Nov 13 06:21:57 UTC 2023
Hello Krister,
Any luck with this?
> -----Original Message-----
> From: Borah, Chaitanya Kumar
> Sent: Friday, November 10, 2023 9:09 AM
> To: Krister Johansen <kjlx at templeofstupid.com>
> Cc: intel-gfx at lists.freedesktop.org; Kurmi, Suresh Kumar
> <Suresh.Kumar.Kurmi at intel.com>; Saarinen, Jani <jani.saarinen at intel.com>;
> Miklos Szeredi <mszeredi at redhat.com>
> Subject: RE: Regression on linux-next (next-20231107)
>
> Hello Krister,
>
> > -----Original Message-----
> > From: Krister Johansen <kjlx at templeofstupid.com>
> > Sent: Friday, November 10, 2023 2:10 AM
> > To: Borah, Chaitanya Kumar <chaitanya.kumar.borah at intel.com>
> > Cc: kjlx at templeofstupid.com; intel-gfx at lists.freedesktop.org; Kurmi,
> > Suresh Kumar <suresh.kumar.kurmi at intel.com>; Saarinen, Jani
> > <jani.saarinen at intel.com>; Miklos Szeredi <mszeredi at redhat.com>
> > Subject: Re: Regression on linux-next (next-20231107)
> >
> > Hi Chaitanya,
> >
> > On Thu, Nov 09, 2023 at 05:00:09PM +0000, Borah, Chaitanya Kumar wrote:
> > > Hello Krister,
> > >
> > > Hope you are doing well. I am Chaitanya from the linux graphics team
> > > in
> > Intel.
> > >
> > > This mail is regarding a regression we are seeing in our CI runs[1]
> > > for some
> > machines (dg2 and adl-p) on linux-next repository.
> > >
> > > Since the version next-20231107 [2], we are seeing the following
> > > error ```````````````````````````````````````````````````````````````````````````````
> > > <4>[ 32.015910] stack segment: 0000 [#1] PREEMPT SMP NOPTI
> > > <4>[ 32.021048] CPU: 15 PID: 766 Comm: fusermount Not tainted 6.6.0-
> > next-20231107-next-20231107-g5cd631a52568+ #1
> > > <4>[ 32.031135] Hardware name: Intel Corporation Raptor Lake Client
> > Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS
> > RPLSFWI1.R00.4221.A00.2305271351 05/27/2023
> > > <4>[ 32.044657] RIP: 0010:fuse_evict_inode+0x61/0x150 [fuse]
> > > ````````````````````````````````````````````````````````````````````
> > > ``
> > > ```````````
> > >
> > > Details log can be found in [3].
> > >
> > > After bisecting the tree, the following patch [4] seems to be the
> > > first "bad" commit
> > >
> > >
> > > ````````````````````````````````````````````````````````````````````
> > > ``
> > > ```````````````````````````````````
> > > 513dfacefd712bcbfab64e1a9c9c3e0d51c2dca5 is the first bad commit
> > > commit 513dfacefd712bcbfab64e1a9c9c3e0d51c2dca5
> > > Author: Krister Johansen kjlx at templeofstupid.com
> > > Date: Fri Nov 3 10:39:47 2023 -0700
> > >
> > > fuse: share lookup state between submount and its parent
> > >
> > > Fuse submounts do not perform a lookup for the nodeid that they
> inherit
> > > from their parent. Instead, the code decrements the nlookup on the
> > > submount's fuse_inode when it is instantiated, and no forget is
> > > performed when a submount root is evicted.
> > >
> > > Trouble arises when the submount's parent is evicted despite the
> > > submount itself being in use. In this author's case, the submount was
> > > in a container and deatched from the initial mount namespace via a
> > > MNT_DEATCH operation. When memory pressure triggered the
> > > shrinker,
> > the
> > > inode from the parent was evicted, which triggered enough forgets to
> > > render the submount's nodeid invalid.
> > >
> > > Since submounts should still function, even if their parent goes away,
> > > solve this problem by sharing refcounted state between the parent and
> > > its submount. When all of the references on this shared state reach
> > > zero, it's safe to forget the final lookup of the fuse nodeid.
> > >
> > >
> > > ````````````````````````````````````````````````````````````````````
> > > ``
> > > ```````````````````````````````````
> > >
> > > We also verified that if we revert the patch the issue is not seen.
> > >
> > > Could you please check why the patch causes this regression and
> > > provide a
> > fix if necessary?
> >
> > Apologies for the inconvenience. I've reproduced the problem, tested
> > a fix, and am in the process of preparing patches to send to Miklos.
> > I'll cc the people on this e-mail in that thread.
> >
> > > [3]
> > > http://gfx-ci.igk.intel.com/tree/linux-next/next-20231109/bat-dg2-14
> > > /b
> > > oot0.txt
> >
> > This link didn't resolve in DNS when I tried to access it. I needed
> > to use intel- gfx-ci.01.org as the hostname instead.
> >
>
> My bad. I realized it too late. Hope you found the logs. If not here they are.
>
> https://intel-gfx-ci.01.org/tree/linux-next/next-20231109/bat-dg2-
> 14/boot0.txt
>
> Regards
>
> Chaitanya
> > Thanks,
> >
> > -K
More information about the Intel-gfx
mailing list