[REGRESSION] on linux-next (next-20250509)

Luke Jones luke at ljones.dev
Tue Jun 10 00:30:18 UTC 2025


On Mon, 9 Jun 2025, at 11:06 PM, Borah, Chaitanya Kumar wrote:
> Hi Luke,
>
>
>> -----Original Message-----
>> From: Kurt Borja <kuurtb at gmail.com>
>> Sent: Wednesday, May 28, 2025 9:11 PM
>> To: Luke Jones <luke at ljones.dev>; Borah, Chaitanya Kumar
>> <chaitanya.kumar.borah at intel.com>
>> Cc: intel-xe at lists.freedesktop.org; intel-gfx at lists.freedesktop.org; Saarinen,
>> Jani <jani.saarinen at intel.com>; Kurmi, Suresh Kumar
>> <suresh.kumar.kurmi at intel.com>; De Marchi, Lucas
>> <lucas.demarchi at intel.com>; Nikula, Jani <jani.nikula at intel.com>; linux-
>> input at vger.kernel.org; platform-driver-x86 at vger.kernel.org
>> Subject: Re: [REGRESSSION] on linux-next (next-20250509)
>> 
>> Hi Luke,
>> 
>> On Wed May 28, 2025 at 10:07 AM -03, Luke Jones wrote:
>> > On Wed, 28 May 2025, at 12:08 PM, Borah, Chaitanya Kumar wrote:
>> >> Hello Luke,
>> >>
>> >> Hope you are doing well. I am Chaitanya from the linux graphics team in
>> Intel.
>> >>
>> >> This mail is regarding a regression we are seeing in our CI runs[1]
>> >> on linux-next repository.
>> >
>> > Can you tell me if the fix here was included?
>> > https://lkml.org/lkml/2025/5/24/152
>> >
>> > I could change to:
>> > static void asus_s2idle_check_register(void) {
>> >     // Only register for Ally devices
>> >     if (dmi_check_system(asus_rog_ally_device)) {
>> >         if (acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops))
>> >             pr_warn("failed to register LPS0 sleep handler in asus-wmi\n");
>> >     }
>> > }
>> >
>> > but I don't really understand what is happening here. The inner lps0
>> functions won't run unless use_ally_mcu_hack is set.
>> 
>> The RIP is caused by a "list_add double add" warning.
>> 
>> After reading the log, I believe this is happening because
>> asus_wmi_register_driver() is called a second time by eeepc_wmi after
>> asus_nb_wmi, which implies
>> 
>> 	asus_wmi_probe()
>> 	  -> acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops)
>> 
>> is called twice and the warning is triggered.
>> 
>> Line [1] makes me think this could be a race condition, as
>> asus_wmi_register_driver() may be called concurrently.
>> 
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-
>> x86.git/tree/drivers/platform/x86/asus-wmi.c?h=for-next#n5101
>> 
>
> Any update on this? It has now hit  6.16-rc1
>
> https://intel-gfx-ci.01.org/tree/drm-tip/igt@runner@aborted.html

I will send a patch asap. Haven't been able to do so with work and 3 days of flights.

> Regards
>
> Chaitanya
>
>> >
>> > I will do my best to fix but I need to understand what happened a bit better.
>> >
>> > regards,
>> > Luke.
>> >
>> >> Since the version next-20250509 [2], we are seeing the following
>> >> regression
>> >>
>> >> `````````````````````````````````````````````````````````````````````````````````
>> >> <4>[    5.400826] ------------[ cut here ]------------
>> >> <4>[    5.400832] list_add double add: new=ffffffffa07c0ca0,
>> >> prev=ffffffff837e9a60, next=ffffffffa07c0ca0.
>> >> <4>[    5.400845] WARNING: CPU: 0 PID: 379 at lib/list_debug.c:35
>> >> __list_add_valid_or_report+0xdc/0xf0
>> >> <4>[    5.400850] Modules linked in: cmdlinepart(+) eeepc_wmi(+)
>> >> asus_nb_wmi(+) asus_wmi spi_nor(+) sparse_keymap mei_pxp mtd
>> >> platform_profile kvm_intel(+) mei_hdcp wmi_bmof kvm irqbypass
>> >> polyval_clmulni usbhid ghash_clmulni_intel snd_hda_intel hid
>> >> sha1_ssse3
>> >> r8152(+) binfmt_misc aesni_intel snd_intel_dspcfg mii r8169
>> >> snd_hda_codec rapl video snd_hda_core intel_cstate snd_hwdep realtek
>> >> snd_pcm snd_timer mei_me snd i2c_i801 i2c_mux spi_intel_pci idma64
>> >> soundcore spi_intel i2c_smbus mei intel_pmc_core nls_iso8859_1
>> >> pmt_telemetry pmt_class intel_pmc_ssram_telemetry pinctrl_alderlake
>> >> intel_vsec acpi_tad wmi acpi_pad dm_multipath msr nvme_fabrics fuse
>> >> efi_pstore nfnetlink ip_tables x_tables autofs4
>> >> <4>[    5.400904] CPU: 0 UID: 0 PID: 379 Comm: (udev-worker) Tainted: G
>> >> S
>> >> 6.15.0-rc7-next-20250526-next-20250526-g3be1a7a31fbd+ #1
>> >> PREEMPT(voluntary)
>> >> <4>[    5.400907] Tainted: [S]=CPU_OUT_OF_SPEC
>> >> <4>[    5.400908] Hardware name: ASUS System Product Name/PRIME
>> Z790-P
>> >> WIFI, BIOS 0812 02/24/2023
>> >> <4>[    5.400909] RIP: 0010:__list_add_valid_or_report+0xdc/0xf0
>> >> <4>[    5.400912] Code: 16 48 89 f1 4c 89 e6 e8 a2 c5 5f ff 0f 0b 31 c0
>> >> e9 72 ff ff ff 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 68 ba 0f 83 e8 84
>> >> c5 5f ff <0f> 0b 31 c0 e9 54 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00
>> >> 00 90
>> >> 90
>> >> <4>[    5.400914] RSP: 0018:ffffc90002763588 EFLAGS: 00010246
>> >> <4>[    5.400916] RAX: 0000000000000000 RBX: ffffffffa07c0ca0 RCX:
>> >> 0000000000000000
>> >> <4>[    5.400918] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>> >> 0000000000000000
>> >> <4>[    5.400919] RBP: ffffc90002763598 R08: 0000000000000000 R09:
>> >> 0000000000000000
>> >> <4>[    5.400920] R10: 0000000000000000 R11: 0000000000000000 R12:
>> >> ffffffffa07c0ca0
>> >> <4>[    5.400921] R13: ffffffffa07c0ca0 R14: 0000000000000000 R15:
>> >> ffff8881212d6da0
>> >> <4>[    5.400923] FS:  0000778637b418c0(0000) GS:ffff8888dad0c000(0000)
>> >> knlGS:0000000000000000
>> >> <4>[    5.400926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >> <4>[    5.400928] CR2: 00007786373b80b2 CR3: 0000000116faa000 CR4:
>> >> 0000000000f50ef0
>> >> <4>[    5.400931] PKRU: 55555554
>> >> <4>[    5.400933] Call Trace:
>> >> <4>[    5.400935]  <TASK>
>> >> <4>[    5.400937]  ? lock_system_sleep+0x2b/0x40
>> >> <4>[    5.400942]  acpi_register_lps0_dev+0x58/0xb0
>> >> <4>[    5.400949]  asus_wmi_probe+0x7f/0x1930 [asus_wmi]
>> >> <4>[    5.400956]  ? kernfs_create_link+0x69/0xe0
>> >> `````````````````````````````````````````````````````````````````````
>> >> ````````````
>> >> Detailed log can be found in [3].
>> >>
>> >> After bisecting the tree, the following patch [4] seems to be the first "bad"
>> >> commit
>> >>
>> >> `````````````````````````````````````````````````````````````````````
>> >> ````````````````````````````````````
>> >> commit feea7bd6b02d43a794e3f065650d89cf8d8e8e59
>> >> Author: Luke D. Jones mailto:luke at ljones.dev
>> >> Date:   Sun Mar 23 15:34:21 2025 +1300
>> >>
>> >>     platform/x86: asus-wmi: Refactor Ally suspend/resume
>> >> `````````````````````````````````````````````````````````````````````
>> >> ````````````````````````````````````
>> >>
>> >> We could not revert the patch because of merge conflict but resetting
>> >> to the parent of the commit seems to fix the issue
>> >>
>> >> Could you please check why the patch causes this regression and
>> >> provide a fix if necessary?
>> >>
>> >> Thank you.
>> >>
>> >> Regards
>> >>
>> >> Chaitanya
>> >>
>> >> [1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
>> >> [2]
>> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/c
>> >> ommit/?h=next-20250509
>> >> [3]
>> >> https://intel-gfx-ci.01.org/tree/linux-next/next-20250526/bat-rpls-4/
>> >> boot0.txt
>> >> [4]
>> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/c
>> >> ommit/?h=next-
>> 20250509&id=feea7bd6b02d43a794e3f065650d89cf8d8e8e59
>> 
>> 
>> --
>>  ~ Kurt


More information about the Intel-xe mailing list