[REGRESSSION] on linux-next (next-20250509)
Kurt Borja
kuurtb at gmail.com
Wed May 28 15:40:52 UTC 2025
Hi Luke,
On Wed May 28, 2025 at 10:07 AM -03, Luke Jones wrote:
> On Wed, 28 May 2025, at 12:08 PM, Borah, Chaitanya Kumar wrote:
>> Hello Luke,
>>
>> Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
>>
>> This mail is regarding a regression we are seeing in our CI runs[1] on
>> linux-next repository.
>
> Can you tell me if the fix here was included?
> https://lkml.org/lkml/2025/5/24/152
>
> I could change to:
> static void asus_s2idle_check_register(void)
> {
> // Only register for Ally devices
> if (dmi_check_system(asus_rog_ally_device)) {
> if (acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops))
> pr_warn("failed to register LPS0 sleep handler in asus-wmi\n");
> }
> }
>
> but I don't really understand what is happening here. The inner lps0 functions won't run unless use_ally_mcu_hack is set.
The RIP is caused by a "list_add double add" warning.
After reading the log, I believe this is happening because
asus_wmi_register_driver() is called a second time by eeepc_wmi after
asus_nb_wmi, which implies
asus_wmi_probe()
-> acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops)
is called twice and the warning is triggered.
Line [1] makes me think this could be a race condition, as
asus_wmi_register_driver() may be called concurrently.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/tree/drivers/platform/x86/asus-wmi.c?h=for-next#n5101
>
> I will do my best to fix but I need to understand what happened a bit better.
>
> regards,
> Luke.
>
>> Since the version next-20250509 [2], we are seeing the following regression
>>
>> `````````````````````````````````````````````````````````````````````````````````
>> <4>[ 5.400826] ------------[ cut here ]------------
>> <4>[ 5.400832] list_add double add: new=ffffffffa07c0ca0,
>> prev=ffffffff837e9a60, next=ffffffffa07c0ca0.
>> <4>[ 5.400845] WARNING: CPU: 0 PID: 379 at lib/list_debug.c:35
>> __list_add_valid_or_report+0xdc/0xf0
>> <4>[ 5.400850] Modules linked in: cmdlinepart(+) eeepc_wmi(+)
>> asus_nb_wmi(+) asus_wmi spi_nor(+) sparse_keymap mei_pxp mtd
>> platform_profile kvm_intel(+) mei_hdcp wmi_bmof kvm irqbypass
>> polyval_clmulni usbhid ghash_clmulni_intel snd_hda_intel hid sha1_ssse3
>> r8152(+) binfmt_misc aesni_intel snd_intel_dspcfg mii r8169
>> snd_hda_codec rapl video snd_hda_core intel_cstate snd_hwdep realtek
>> snd_pcm snd_timer mei_me snd i2c_i801 i2c_mux spi_intel_pci idma64
>> soundcore spi_intel i2c_smbus mei intel_pmc_core nls_iso8859_1
>> pmt_telemetry pmt_class intel_pmc_ssram_telemetry pinctrl_alderlake
>> intel_vsec acpi_tad wmi acpi_pad dm_multipath msr nvme_fabrics fuse
>> efi_pstore nfnetlink ip_tables x_tables autofs4
>> <4>[ 5.400904] CPU: 0 UID: 0 PID: 379 Comm: (udev-worker) Tainted: G
>> S
>> 6.15.0-rc7-next-20250526-next-20250526-g3be1a7a31fbd+ #1
>> PREEMPT(voluntary)
>> <4>[ 5.400907] Tainted: [S]=CPU_OUT_OF_SPEC
>> <4>[ 5.400908] Hardware name: ASUS System Product Name/PRIME Z790-P
>> WIFI, BIOS 0812 02/24/2023
>> <4>[ 5.400909] RIP: 0010:__list_add_valid_or_report+0xdc/0xf0
>> <4>[ 5.400912] Code: 16 48 89 f1 4c 89 e6 e8 a2 c5 5f ff 0f 0b 31 c0
>> e9 72 ff ff ff 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 68 ba 0f 83 e8 84 c5
>> 5f ff <0f> 0b 31 c0 e9 54 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 90
>> 90
>> <4>[ 5.400914] RSP: 0018:ffffc90002763588 EFLAGS: 00010246
>> <4>[ 5.400916] RAX: 0000000000000000 RBX: ffffffffa07c0ca0 RCX:
>> 0000000000000000
>> <4>[ 5.400918] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>> 0000000000000000
>> <4>[ 5.400919] RBP: ffffc90002763598 R08: 0000000000000000 R09:
>> 0000000000000000
>> <4>[ 5.400920] R10: 0000000000000000 R11: 0000000000000000 R12:
>> ffffffffa07c0ca0
>> <4>[ 5.400921] R13: ffffffffa07c0ca0 R14: 0000000000000000 R15:
>> ffff8881212d6da0
>> <4>[ 5.400923] FS: 0000778637b418c0(0000) GS:ffff8888dad0c000(0000)
>> knlGS:0000000000000000
>> <4>[ 5.400926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> <4>[ 5.400928] CR2: 00007786373b80b2 CR3: 0000000116faa000 CR4:
>> 0000000000f50ef0
>> <4>[ 5.400931] PKRU: 55555554
>> <4>[ 5.400933] Call Trace:
>> <4>[ 5.400935] <TASK>
>> <4>[ 5.400937] ? lock_system_sleep+0x2b/0x40
>> <4>[ 5.400942] acpi_register_lps0_dev+0x58/0xb0
>> <4>[ 5.400949] asus_wmi_probe+0x7f/0x1930 [asus_wmi]
>> <4>[ 5.400956] ? kernfs_create_link+0x69/0xe0
>> `````````````````````````````````````````````````````````````````````````````````
>> Detailed log can be found in [3].
>>
>> After bisecting the tree, the following patch [4] seems to be the first "bad"
>> commit
>>
>> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>> commit feea7bd6b02d43a794e3f065650d89cf8d8e8e59
>> Author: Luke D. Jones mailto:luke at ljones.dev
>> Date: Sun Mar 23 15:34:21 2025 +1300
>>
>> platform/x86: asus-wmi: Refactor Ally suspend/resume
>> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>>
>> We could not revert the patch because of merge conflict but resetting
>> to the parent of the commit seems to fix the issue
>>
>> Could you please check why the patch causes this regression and provide
>> a fix if necessary?
>>
>> Thank you.
>>
>> Regards
>>
>> Chaitanya
>>
>> [1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
>> [2]
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20250509
>> [3]
>> https://intel-gfx-ci.01.org/tree/linux-next/next-20250526/bat-rpls-4/boot0.txt
>> [4]
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20250509&id=feea7bd6b02d43a794e3f065650d89cf8d8e8e59
--
~ Kurt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/intel-xe/attachments/20250528/ba76fe60/attachment.sig>
More information about the Intel-xe
mailing list