[Intel-gfx] Regression on linux-next (next-20231130)

Borah, Chaitanya Kumar chaitanya.kumar.borah at intel.com
Mon Dec 4 17:17:25 UTC 2023


Hello Johannes,

Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.

This mail is regarding a regression we are seeing in our CI runs[1] on linux-next repository.

Since the version next-20231130 [2], we are seeing the following regression

 `````````````````````````````````````````````````````````````````````````````````
<4> [198.663557] ======================================================
<4> [198.663559] WARNING: possible circular locking dependency detected
<4> [198.663562] 6.7.0-rc4-next-20231204-next-20231204-g629a3b49f3f9+ #1 Not tainted
<4> [198.663566] ------------------------------------------------------
<4> [198.663568] core_hotunplug/5433 is trying to acquire lock:
<4> [198.663571] ffff8881481b5068 (debugfs:i915_lpsp_capability#7){++++}-{0:0}, at: remove_one+0x56/0x160
<4> [198.663580] 
but task is already holding lock:
<4> [198.663583] ffff88810ef2e9d0 (&sb->s_type->i_mutex_key#2){++++}-{3:3}, at: simple_recursive_removal+0x1a1/0x2e0
<4> [198.663591] 
which lock already depends on the new lock.
<4> [198.663594] 
the existing dependency chain (in reverse order) is:
 `````````````````````````````````````````````````````````````````````````````````
Details log can be found in [3].

Locally we have seen a slightly different version of the issue

[  663.199573] core_hotunplug/1735 is trying to acquire lock:
[  663.199574] ffff888133406e68 (debugfs:i915_pipe){++++}-{0:0}, at: remove_one+0x56/0x160
 
After bisecting the tree, the following patch [4] seems to be the first "bad"
commit

`````````````````````````````````````````````````````````````````````````````````````````````````````````
commit f4acfcd4deb158b96595250cc332901b282d15b0
Author: Johannes Berg johannes.berg at intel.com
Date:   Fri Nov 24 17:25:25 2023 +0100

    debugfs: annotate debugfs handlers vs. removal with lockdep

    When you take a lock in a debugfs handler but also try
    to remove the debugfs file under that lock, things can
    deadlock since the removal has to wait for all users
    to finish.

    Add lockdep annotations in debugfs_file_get()/_put()
    to catch such issues.

    Acked-by: Greg Kroah-Hartman gregkh at linuxfoundation.org
    Signed-off-by: Johannes Berg johannes.berg at intel.com

fs/debugfs/file.c     | 10 ++++++++++
fs/debugfs/inode.c    | 12 ++++++++++++
fs/debugfs/internal.h |  6 ++++++
3 files changed, 28 insertions(+)
`````````````````````````````````````````````````````````````````````````````````````````````````````````

We also verified that if we revert the patch the issue is not seen.

Could you please check why the patch causes this regression and provide a fix
if necessary?

Thank you.

Regards

Chaitanya

[1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
[2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20231130
[3] https://intel-gfx-ci.01.org/tree/linux-next/next-20231204/bat-dg2-9/igt@core_hotunplug@unbind-rebind.html
[4] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20231130&id=f4acfcd4deb158b96595250cc332901b282d15b0


More information about the Intel-gfx mailing list