Regression on linux-next (next-20250120)
Borah, Chaitanya Kumar
chaitanya.kumar.borah at intel.com
Tue Jan 28 16:00:58 UTC 2025
Hello Al,
> -----Original Message-----
> From: Al Viro <viro at ftp.linux.org.uk> On Behalf Of Al Viro
> Sent: Monday, January 27, 2025 10:34 AM
> To: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
> Cc: Borah, Chaitanya Kumar <chaitanya.kumar.borah at intel.com>; intel-
> gfx at lists.freedesktop.org; intel-xe at lists.freedesktop.org; Kurmi, Suresh
> Kumar <suresh.kumar.kurmi at intel.com>; Saarinen, Jani
> <jani.saarinen at intel.com>; linux-fsdevel at vger.kernel.org; Alexander Gordeev
> <agordeev at linux.ibm.com>
> Subject: Re: Regression on linux-next (next-20250120)
>
> On Sun, Jan 26, 2025 at 04:54:55PM +0100, Alexander Gordeev wrote:
>
> > > > Since the version next-20250120 [2], we are seeing the following
> > > > regression
> > >
> > > Ugh... To narrow the things down, could you see if replacing
> > > fsd = kmalloc(sizeof(*fsd), GFP_KERNEL); with
> > > fsd = kzalloc(sizeof(*fsd), GFP_KERNEL); in
> > > fs/debugfs/file.c:__debugfs_file_get() affects the test?
> >
> > This change fixes lots of the below failures in our CI. FWIW:
> >
> > Tested-by: Alexander Gordeev <agordeev at linux.ibm.com>
>
> The real fix follows:
>
> [PATCH] Fix the missing initializations in __debugfs_file_get()
>
> both method table pointers in debugfs_fsdata need to be initialized,
> obviously...
>
> Fixes: 41a0ecc0997c "debugfs: get rid of dynamically allocation proxy_ops"
> Fucked-up-by: Al Viro <viro at zeniv.linux.org.uk>
> Signed-off-by: Al Viro <viro at zeniv.linux.org.uk>
> ---
> diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c index
> e33cc77699cd..212cd8128e1f 100644
> --- a/fs/debugfs/file.c
> +++ b/fs/debugfs/file.c
> @@ -111,6 +111,7 @@ static int __debugfs_file_get(struct dentry *dentry,
> enum dbgfs_get_mode mode)
> fsd->methods |= HAS_READ;
> if (ops->write)
> fsd->methods |= HAS_WRITE;
> + fsd->real_fops = NULL;
> } else {
> const struct file_operations *ops;
> ops = fsd->real_fops = DEBUGFS_I(inode)->real_fops;
> @@ -124,6 +125,7 @@ static int __debugfs_file_get(struct dentry *dentry,
> enum dbgfs_get_mode mode)
> fsd->methods |= HAS_IOCTL;
> if (ops->poll)
> fsd->methods |= HAS_POLL;
> + fsd->short_fops = NULL;
> }
> refcount_set(&fsd->active_users, 1);
> init_completion(&fsd->active_users_drained);
Unfortunately this change does not help us. I think it is the methods member that causes the problem. So the following change solves the problem for us.
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -102,6 +102,8 @@ static int __debugfs_file_get(struct dentry *dentry, enum dbgfs_get_mode mode)
if (!fsd)
return -ENOMEM;
+ fsd->methods = 0;
+
if (mode == DBGFS_GET_SHORT) {
const struct debugfs_short_fops *ops;
ops = fsd->short_fops = DEBUGFS_I(inode)->short_fops;
My guess is, since methods have some junk value in it, we are trying to call a read function for a debugfs entry for which it doesn't exist. That leads to the failure.
<4>[ 34.240163] Call Trace:
<4>[ 34.240164] <TASK>
<4>[ 34.240165] ? show_regs+0x6c/0x80
<4>[ 34.240169] ? __die+0x24/0x80
<4>[ 34.240171] ? page_fault_oops+0x175/0x5d0
<4>[ 34.240176] ? do_user_addr_fault+0x4c6/0x9d0
<4>[ 34.240179] ? exc_page_fault+0x8a/0x300
<4>[ 34.240182] ? asm_exc_page_fault+0x27/0x30
<4>[ 34.240187] *full_proxy_read*+0x6b/0xb0
<4>[ 34.240191] vfs_read+0xf9/0x390
My take would be to stick with the kzalloc like you suggested.
Regards
Chaitanya
More information about the Intel-xe
mailing list