Regression on linux-next (next-20241120) and drm-tip
Luca Coelho
luca at coelho.fi
Tue Dec 3 09:18:55 UTC 2024
On Tue, 2024-12-03 at 09:25 +0100, Thomas Weißschuh wrote:
> On 2024-12-03 09:50:05+0200, Luca Coelho wrote:
> > On Tue, 2024-12-03 at 07:50 +0100, Thomas Weißschuh wrote:
> > > (+Cc Sebastian)
> > >
> > > Hi Chaitanya,
> > >
> > > On 2024-12-03 05:07:47+0000, Borah, Chaitanya Kumar wrote:
> > > > Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
> > > >
> > > > This mail is regarding a regression we are seeing in our CI runs[1] on linux-next repository.
> > >
> > > Thanks for the report.
> > >
> > > > Since the version next-20241120 [2], we are seeing the following regression
> > > >
> > > > `````````````````````````````````````````````````````````````````````````````````
> > > > <4>[ 19.990743] Oops: general protection fault, probably for non-canonical address 0xb11675ef8d1ccbce: 0000 [#1] PREEMPT SMP NOPTI
> > > > <4>[ 19.990760] CPU: 21 UID: 110 PID: 867 Comm: prometheus-node Not tainted 6.12.0-next-20241120-next-20241120-gac24e26aa08f+ #1
> > > > <4>[ 19.990771] Hardware name: Intel Corporation Arrow Lake Client Platform/MTL-S UDIMM 2DPC EVCRB, BIOS MTLSFWI1.R00.4400.D85.2410100007 10/10/2024
> > > > <4>[ 19.990782] RIP: 0010:power_supply_get_property+0x3e/0xe0
> > > > `````````````````````````````````````````````````````````````````````````````````
> > > > Details log can be found in [3].
> > > >
> > > > After bisecting the tree, the following patch [4] seems to be the first "bad"
> > > > commit
> > > >
> > > > `````````````````````````````````````````````````````````````````````````````````````````````````````````
> > > > Commit 49000fee9e639f62ba1f965ed2ae4c5ad18d19e2
> > > > Author: Thomas Weißschuh <mailto:linux at weissschuh.net>
> > > > AuthorDate: Sat Oct 5 12:05:03 2024 +0200
> > > > Commit: Sebastian Reichel <mailto:sebastian.reichel at collabora.com>
> > > > CommitDate: Tue Oct 15 22:22:20 2024 +0200
> > > > power: supply: core: add wakeup source inhibit by power_supply_config
> > > > `````````````````````````````````````````````````````````````````````````````````````````````````````````
> > > >
> > > > This is now seen in our drm-tip runs as well. [5]
> > > >
> > > > Could you please check why the patch causes this regression and provide a fix if necessary?
> > >
> > > I don't see how this patch can lead to this error.
> > > Could you doublecheck the bisect?
> >
> > FWIW I also bisected this and came to the same conclusion, this is the
> > first bad commit. My guess is that some component is not yet setting
> > things up properly for the new feature.
>
> The thing is that at this point nothing is using this feature.
> And the new code runs during registration while the error happens later.
>
> > This is very easily reproducible in our system, with vanila 6.13-rc1,
> > so if there's anything you want to try, let us know.
>
> Can you try the following diffs, each alone on top of
> 49000fee9e639f62ba1f965ed2ae4c5ad18d19e2?
>
> diff --git a/drivers/power/supply/power_supply_core.c b/drivers/power/supply/power_supply_core.c
> index a2005e3c6f38..c6e7ca5b1283 100644
> --- a/drivers/power/supply/power_supply_core.c
> +++ b/drivers/power/supply/power_supply_core.c
> @@ -1411,7 +1411,7 @@ __power_supply_register(struct device *parent,
> goto device_add_failed;
>
> if (cfg && cfg->no_wakeup_source)
> - ws = false;
> + ;
>
> rc = device_init_wakeup(dev, ws);
> if (rc)
>
> diff --git a/drivers/power/supply/power_supply_core.c b/drivers/power/supply/power_supply_core.c
> index a2005e3c6f38..5aefba2ddcda 100644
> --- a/drivers/power/supply/power_supply_core.c
> +++ b/drivers/power/supply/power_supply_core.c
> @@ -1410,9 +1410,6 @@ __power_supply_register(struct device *parent,
> if (rc)
> goto device_add_failed;
>
> - if (cfg && cfg->no_wakeup_source)
> - ws = false;
> -
> rc = device_init_wakeup(dev, ws);
> if (rc)
> goto wakeup_init_failed;
>
I'll try this out now.
> Could you also print the name of the device?
This is a new Panther Lake machine, but we have reports of this
happening on other platforms as well. Which device exactly you want
the info on?
>
> diff --git a/drivers/power/supply/power_supply_core.c b/drivers/power/supply/power_supply_core.c
> index a2005e3c6f38..63e9e339cc01 100644
> --- a/drivers/power/supply/power_supply_core.c
> +++ b/drivers/power/supply/power_supply_core.c
> @@ -1356,6 +1356,8 @@ __power_supply_register(struct device *parent,
> pr_warn("%s: Expected proper parent device for '%s'\n",
> __func__, desc->name);
>
> + pr_warn("PSY: name=%s\n", desc->name);
> +
> psy = kzalloc(sizeof(*psy), GFP_KERNEL);
> if (!psy)
> return ERR_PTR(-ENOMEM);
>
>
> Also line numbers would be useful.
> Is this configuration running KASAN?
There's no KASAN, but I can add it if needed.
Here's the full crash report I got yesterday, it's from our so-called
drm-tip, which is basically v6.13-rc1 with DRM stuff on top:
[ 99.288768] display-ptlh-1 kernel: Oops: general protection fault, probably for non-canonical address 0xafafafafafafafaf: 0000 [#1] PREEMPT SMP NOPTI
[ 99.300294] display-ptlh-1 kernel: CPU: 3 UID: 0 PID: 10899 Comm: udevadm Not tainted 6.13.0-rc1-xe+ #13
[ 99.307849] display-ptlh-1 kernel: Hardware name: Intel Corporation Panther Lake Client Platform/PTL-UH LP5 T3 RVP1, BIOS PTLPFWI1.R00.2454.D00.2411071130 11/07/2024
[ 99.320731] display-ptlh-1 kernel: RIP: 0010:string+0x4d/0xe0
[ 99.324541] display-ptlh-1 kernel: Code: ff 77 3c 45 89 d1 31 f6 49 01 f9 66 45 85 d2 75 19 eb 1e 49 39 f8 76 02 88 07 48 83 c7 01 83 c6 01 48 83 c2 01 4c 39 cf 74 07 <0f> b6 02 84 c0 75 e2 4c 89 c2 e8 f4 eb ff ff 5d c3 cc cc cc cc 48
[ 99.343456] display-ptlh-1 kernel: RSP: 0018:ffffc90012fcf930 EFLAGS: 00010286
[ 99.348733] display-ptlh-1 kernel: RAX: afafafafafaf9faf RBX: ffffc90012fcf9a8 RCX: ffff0a00ffffff04
[ 99.355937] display-ptlh-1 kernel: RDX: afafafafafafafaf RSI: 0000000000000000 RDI: ffff888111829243
[ 99.363136] display-ptlh-1 kernel: RBP: ffffc90012fcf930 R08: ffff888111829a1c R09: ffff888211829242
[ 99.370339] display-ptlh-1 kernel: R10: ffffffffffffffff R11: 0000000000000000 R12: ffff888111829a1c
[ 99.377542] display-ptlh-1 kernel: R13: ffffffff82f68964 R14: ffffffff82f68964 R15: ffff888111829243
[ 99.384743] display-ptlh-1 kernel: FS: 00007f973d83b8c0(0000) GS:ffff88844b980000(0000) knlGS:0000000000000000
[ 99.392904] display-ptlh-1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 99.398708] display-ptlh-1 kernel: CR2: 00005620a7387a18 CR3: 000000012bec4006 CR4: 0000000000772ef0
[ 99.405913] display-ptlh-1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 99.413113] display-ptlh-1 kernel: DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[ 99.420318] display-ptlh-1 kernel: PKRU: 55555554
[ 99.423062] display-ptlh-1 kernel: Call Trace:
[ 99.425543] display-ptlh-1 kernel: <TASK>
[ 99.427675] display-ptlh-1 kernel: ? show_regs+0x69/0x80
[ 99.431135] display-ptlh-1 kernel: ? die_addr+0x38/0x90
[ 99.434488] display-ptlh-1 kernel: ? exc_general_protection+0x1d4/0x440
[ 99.439242] display-ptlh-1 kernel: ? asm_exc_general_protection+0x27/0x30
[ 99.444186] display-ptlh-1 kernel: ? string+0x4d/0xe0
[ 99.447367] display-ptlh-1 kernel: vsnprintf+0x23e/0x560
[ 99.450815] display-ptlh-1 kernel: add_uevent_var+0x96/0x190
[ 99.454610] display-ptlh-1 kernel: ? string+0x5c/0xe0
[ 99.457790] display-ptlh-1 kernel: power_supply_uevent+0x5a/0x200
[ 99.462025] display-ptlh-1 kernel: dev_uevent+0x106/0x2e0
[ 99.465555] display-ptlh-1 kernel: uevent_show+0xac/0x140
[ 99.469082] display-ptlh-1 kernel: dev_attr_show+0x1a/0x60
[ 99.472701] display-ptlh-1 kernel: sysfs_kf_seq_show+0xaa/0x140
[ 99.476758] display-ptlh-1 kernel: kernfs_seq_show+0x3f/0x50
[ 99.480548] display-ptlh-1 kernel: seq_read_iter+0x125/0x4e0
[ 99.484342] display-ptlh-1 kernel: kernfs_fop_read_iter+0x170/0x200
[ 99.488748] display-ptlh-1 kernel: vfs_read+0x260/0x350
[ 99.492106] display-ptlh-1 kernel: ksys_read+0x70/0xf0
[ 99.495372] display-ptlh-1 kernel: __x64_sys_read+0x19/0x20
[ 99.499076] display-ptlh-1 kernel: x64_sys_call+0x1b85/0x2140
[ 99.502954] display-ptlh-1 kernel: do_syscall_64+0x87/0x140
[ 99.506656] display-ptlh-1 kernel: ? trace_irq_disable+0x6d/0xa0
[ 99.510799] display-ptlh-1 kernel: ? trace_irq_enable+0x6d/0xa0
[ 99.514853] display-ptlh-1 kernel: ? syscall_exit_to_user_mode+0xcc/0x200
[ 99.519779] display-ptlh-1 kernel: ? do_syscall_64+0x93/0x140
[ 99.523659] display-ptlh-1 kernel: ? __fput+0x1c6/0x2f0
[ 99.527014] display-ptlh-1 kernel: ? trace_irq_disable+0x6d/0xa0
[ 99.531155] display-ptlh-1 kernel: ? trace_irq_enable+0x6d/0xa0
[ 99.535211] display-ptlh-1 kernel: ? syscall_exit_to_user_mode+0xcc/0x200
[ 99.540138] display-ptlh-1 kernel: ? do_syscall_64+0x93/0x140
[ 99.544019] display-ptlh-1 kernel: ? trace_irq_enable+0x6d/0xa0
[ 99.548078] display-ptlh-1 kernel: ? syscall_exit_to_user_mode+0xcc/0x200
[ 99.553004] display-ptlh-1 kernel: ? do_syscall_64+0x93/0x140
[ 99.556882] display-ptlh-1 kernel: ? syscall_exit_to_user_mode+0xcc/0x200
[ 99.561809] display-ptlh-1 kernel: ? do_syscall_64+0x93/0x140
[ 99.565691] display-ptlh-1 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 99.570795] display-ptlh-1 kernel: RIP: 0033:0x7f973d71ba61
[ 99.574413] display-ptlh-1 kernel: Code: 00 48 8b 15 b9 73 0e 00 f7 d8 64 89 02 b8 ff ff ff ff eb bd e8 40 c4 01 00 f3 0f 1e fa 80 3d e5 f5 0e 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 4f c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec
[ 99.593325] display-ptlh-1 kernel: RSP: 002b:00007ffce625b508 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 99.600962] display-ptlh-1 kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f973d71ba61
[ 99.608165] display-ptlh-1 kernel: RDX: 0000000000001008 RSI: 00005620a7386a70 RDI: 0000000000000003
[ 99.615366] display-ptlh-1 kernel: RBP: 00007ffce625b610 R08: 00007f973d803b20 R09: 0000000000000000
[ 99.622571] display-ptlh-1 kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000001008
[ 99.629774] display-ptlh-1 kernel: R13: ffffffffffffffff R14: 0000000000001008 R15: 00005620a7386a70
[ 99.636982] display-ptlh-1 kernel: </TASK>
[ 99.639198] display-ptlh-1 kernel: Modules linked in: snd_sof_pci_intel_ptl snd_sof_pci_intel_lnl snd_sof_pci_intel_mtl snd_sof_intel_hda_generic snd_sof_pci snd_sof_xtensa_dsp snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi snd_soc_acpi_intel_sdca_quirks snd_intel_dspcfg snd_hda_codec snd_hwdep snd_sof_intel_hda_mlink snd_hda_ext_core snd_hda_core snd_soc_sdca x86_pkg_temp_thermal intel_powerclamp coretemp snd_soc_core snd_compress kvm_intel snd_pcm kvm crct10dif_pclmul crc32_pclmul polyval_clmulni snd_seq polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 snd_seq_device snd_timer sha1_ssse3 cdc_ether snd aesni_intel usbnet wmi_bmof crypto_simd mii cryptd e1000e i2c_i801 soundcore i2c_smbus idma64 thunderbolt ucsi_acpi typec_ucsi igen6_edac typec binfmt_misc video ov13b10 v4l2_fwnode v4l2_async videodev mc intel_skl_int3472_tps68470 wmi tps68470_regulator intel_pmc_core clk_tps68470 acpi_tad nls_iso8859_1 intel_vsec intel_skl_int3472_discrete pmt_telemetry
[ 99.639250] display-ptlh-1 kernel: intel_skl_int3472_common acpi_pad pmt_class input_leds mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore drm nfnetlink ip_tables x_tables autofs4
[ 99.747167] display-ptlh-1 kernel: ---[ end trace 0000000000000000 ]---
[ 99.974871] display-ptlh-1 kernel: RIP: 0010:string+0x4d/0xe0
[ 99.978693] display-ptlh-1 kernel: Code: ff 77 3c 45 89 d1 31 f6 49 01 f9 66 45 85 d2 75 19 eb 1e 49 39 f8 76 02 88 07 48 83 c7 01 83 c6 01 48 83 c2 01 4c 39 cf 74 07 <0f> b6 02 84 c0 75 e2 4c 89 c2 e8 f4 eb ff ff 5d c3 cc cc cc cc 48
[ 99.997606] display-ptlh-1 kernel: RSP: 0018:ffffc90012fcf930 EFLAGS: 00010286
[ 100.002891] display-ptlh-1 kernel: RAX: afafafafafaf9faf RBX: ffffc90012fcf9a8 RCX: ffff0a00ffffff04
[ 100.010099] display-ptlh-1 kernel: RDX: afafafafafafafaf RSI: 0000000000000000 RDI: ffff888111829243
[ 100.017302] display-ptlh-1 kernel: RBP: ffffc90012fcf930 R08: ffff888111829a1c R09: ffff888211829242
[ 100.024502] display-ptlh-1 kernel: R10: ffffffffffffffff R11: 0000000000000000 R12: ffff888111829a1c
[ 100.031701] display-ptlh-1 kernel: R13: ffffffff82f68964 R14: ffffffff82f68964 R15: ffff888111829243
[ 100.038900] display-ptlh-1 kernel: FS: 00007f973d83b8c0(0000) GS:ffff88844b980000(0000) knlGS:0000000000000000
[ 100.047062] display-ptlh-1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 100.052866] display-ptlh-1 kernel: CR2: 00005620a7387a18 CR3: 000000012bec4006 CR4: 0000000000772ef0
[ 100.060066] display-ptlh-1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 100.067265] display-ptlh-1 kernel: DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[ 100.074473] display-ptlh-1 kernel: PKRU: 55555554
--
Cheers,
Luca.
More information about the Intel-gfx
mailing list