Regression on linux-next (next-20241106)

Rafael J. Wysocki rafael at kernel.org
Mon Nov 11 13:28:01 UTC 2024


Hi Chaitanya,

On Mon, Nov 11, 2024 at 6:41 AM Borah, Chaitanya Kumar
<chaitanya.kumar.borah at intel.com> wrote:
>
> Hello Rafael,
>
> Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
>
> This mail is regarding a regression we are seeing in our CI runs[1] on linux-next repository.
>
> Since the version next-20241106 [2], we are seeing the following regression
>
> `````````````````````````````````````````````````````````````````````````````````
> <4>[    7.246473] WARNING: possible circular locking dependency detected
> <4>[    7.246476] 6.12.0-rc6-next-20241106-next-20241106-g5b913f5d7d7f+ #1 Not tainted
> <4>[    7.246479] ------------------------------------------------------
> <4>[    7.246481] swapper/0/1 is trying to acquire lock:
> <4>[    7.246483] ffffffff8264aef0 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0xd/0x20
> <4>[    7.246493]
>                   but task is already holding lock:
> <4>[    7.246495] ffffffff82832068 (hybrid_capacity_lock){+.+.}-{4:4}, at: intel_pstate_register_driver+0xd3/0x1c0
> `````````````````````````````````````````````````````````````````````````````````
> Details log can be found in [3].

Thanks for the report!

> After bisecting the tree, the following patch [4] seems to be the first "bad"
> commit
>
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
> commit 92447aa5f6e7fbad9427a3fd1bb9e0679c403206
> Author: Rafael J. Wysocki mailto:rafael.j.wysocki at intel.com
> Date:   Mon Nov 4 19:53:53 2024 +0100
>
>     cpufreq: intel_pstate: Update asym capacity for CPUs that were offline initially
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>
> We also verified that if we revert the patch the issue is not seen.
>
> Could you please check why the patch causes this regression and provide a fix if necessary?
>
> [1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20241106
> [3] https://intel-gfx-ci.01.org/tree/linux-next/next-20241106/bat-arls-1/boot0.txt
> [4] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20241106&id=92447aa5f6e7fbad9427a3fd1bb9e0679c403206

The problem is that cpus_read_lock() should not be called under
hybrid_capacity_lock because the latter is acquired in CPU
online/offline paths and this is exposed by the above commit, but if
I'm not mistaken, the issue is there regardless of it.

A good news is that is should be addressed by a patch that has been
posted already:

https://lore.kernel.org/linux-pm/12554508.O9o76ZdvQC@rjwysocki.net/

so please let me know if it makes the splat go away.

Even if its changelog says that it has no functional impact, this is
not really the case.

Thanks!


More information about the Intel-gfx mailing list