Regression on linux-next (next-20241120) and drm-tip

Thomas Weißschuh linux at weissschuh.net
Tue Dec 3 10:49:54 UTC 2024


On 2024-12-03 11:18:55+0200, Luca Coelho wrote:
> On Tue, 2024-12-03 at 09:25 +0100, Thomas Weißschuh wrote:
> > On 2024-12-03 09:50:05+0200, Luca Coelho wrote:
> > > On Tue, 2024-12-03 at 07:50 +0100, Thomas Weißschuh wrote:
> > > > (+Cc Sebastian)
> > > > 
> > > > Hi Chaitanya,
> > > > 
> > > > On 2024-12-03 05:07:47+0000, Borah, Chaitanya Kumar wrote:
> > > > > Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
> > > > > 
> > > > > This mail is regarding a regression we are seeing in our CI runs[1] on linux-next repository.
> > > > 
> > > > Thanks for the report.
> > > > 
> > > > > Since the version next-20241120 [2], we are seeing the following regression
> > > > > 
> > > > > `````````````````````````````````````````````````````````````````````````````````
> > > > > <4>[   19.990743] Oops: general protection fault, probably for non-canonical address 0xb11675ef8d1ccbce: 0000 [#1] PREEMPT SMP NOPTI
> > > > > <4>[   19.990760] CPU: 21 UID: 110 PID: 867 Comm: prometheus-node Not tainted 6.12.0-next-20241120-next-20241120-gac24e26aa08f+ #1
> > > > > <4>[   19.990771] Hardware name: Intel Corporation Arrow Lake Client Platform/MTL-S UDIMM 2DPC EVCRB, BIOS MTLSFWI1.R00.4400.D85.2410100007 10/10/2024
> > > > > <4>[   19.990782] RIP: 0010:power_supply_get_property+0x3e/0xe0
> > > > > `````````````````````````````````````````````````````````````````````````````````
> > > > > Details log can be found in [3]. 
> > > > > 
> > > > > After bisecting the tree, the following patch [4] seems to be the first "bad"
> > > > > commit
> > > > > 
> > > > > `````````````````````````````````````````````````````````````````````````````````````````````````````````
> > > > > Commit 49000fee9e639f62ba1f965ed2ae4c5ad18d19e2
> > > > > Author:     Thomas Weißschuh <mailto:linux at weissschuh.net>
> > > > > AuthorDate: Sat Oct 5 12:05:03 2024 +0200
> > > > > Commit:     Sebastian Reichel <mailto:sebastian.reichel at collabora.com>
> > > > > CommitDate: Tue Oct 15 22:22:20 2024 +0200
> > > > >     power: supply: core: add wakeup source inhibit by power_supply_config    
> > > > > `````````````````````````````````````````````````````````````````````````````````````````````````````````
> > > > > 
> > > > > This is now seen in our drm-tip runs as well. [5]
> > > > > 
> > > > > Could you please check why the patch causes this regression and provide a fix if necessary?
> > > > 
> > > > I don't see how this patch can lead to this error.
> > > > Could you doublecheck the bisect?
> > > 
> > > FWIW I also bisected this and came to the same conclusion, this is the
> > > first bad commit.  My guess is that some component is not yet setting
> > > things up properly for the new feature.
> > 
> > The thing is that at this point nothing is using this feature.
> > And the new code runs during registration while the error happens later.
> > 
> > > This is very easily reproducible in our system, with vanila 6.13-rc1,
> > > so if there's anything you want to try, let us know.
> > 
> > Can you try the following diffs, each alone on top of
> > 49000fee9e639f62ba1f965ed2ae4c5ad18d19e2?
> > 
> > diff --git a/drivers/power/supply/power_supply_core.c b/drivers/power/supply/power_supply_core.c
> > index a2005e3c6f38..c6e7ca5b1283 100644
> > --- a/drivers/power/supply/power_supply_core.c
> > +++ b/drivers/power/supply/power_supply_core.c
> > @@ -1411,7 +1411,7 @@ __power_supply_register(struct device *parent,
> >                 goto device_add_failed;
> > 
> >         if (cfg && cfg->no_wakeup_source)
> > -               ws = false;
> > +               ;
> > 
> >         rc = device_init_wakeup(dev, ws);
> >         if (rc)
> > 
> > diff --git a/drivers/power/supply/power_supply_core.c b/drivers/power/supply/power_supply_core.c
> > index a2005e3c6f38..5aefba2ddcda 100644
> > --- a/drivers/power/supply/power_supply_core.c
> > +++ b/drivers/power/supply/power_supply_core.c
> > @@ -1410,9 +1410,6 @@ __power_supply_register(struct device *parent,
> >         if (rc)
> >                 goto device_add_failed;
> > 
> > -       if (cfg && cfg->no_wakeup_source)
> > -               ws = false;
> > -
> >         rc = device_init_wakeup(dev, ws);
> >         if (rc)
> >                 goto wakeup_init_failed;
> > 
> 
> I'll try this out now.
> 
> 
> > Could you also print the name of the device?
> 
> This is a new Panther Lake machine, but we have reports of this
> happening on other platforms as well.  Which device exactly you want
> the info on?

I want the name of the in-kernel power supply device to figure
out which driver is used.
Sorry for the confusion.
The patch below prints this name.

> > 
> > diff --git a/drivers/power/supply/power_supply_core.c b/drivers/power/supply/power_supply_core.c
> > index a2005e3c6f38..63e9e339cc01 100644
> > --- a/drivers/power/supply/power_supply_core.c
> > +++ b/drivers/power/supply/power_supply_core.c
> > @@ -1356,6 +1356,8 @@ __power_supply_register(struct device *parent,
> >                 pr_warn("%s: Expected proper parent device for '%s'\n",
> >                         __func__, desc->name);
> > 
> > +       pr_warn("PSY: name=%s\n", desc->name);
> > +
> >         psy = kzalloc(sizeof(*psy), GFP_KERNEL);
> >         if (!psy)
> >                 return ERR_PTR(-ENOMEM);
> > 
> > 
> > Also line numbers would be useful.
> > Is this configuration running KASAN?
> 
> There's no KASAN, but I can add it if needed.

I think it would be useful, see below. (In addition to line numbers)

> Here's the full crash report I got yesterday, it's from our so-called
> drm-tip, which is basically v6.13-rc1 with DRM stuff on top:

This is a different trace than the others.
If that also bisects to the same commit that's a useful datapoint.
The register values look suspiciously like a poison value, so KASAN
would be useful.
It's not a poison value known to include/linux/poison.h, though.

> [   99.288768] display-ptlh-1 kernel: Oops: general protection fault, probably for non-canonical address 0xafafafafafafafaf: 0000 [#1] PREEMPT SMP NOPTI
> [   99.300294] display-ptlh-1 kernel: CPU: 3 UID: 0 PID: 10899 Comm: udevadm Not tainted 6.13.0-rc1-xe+ #13
> [   99.307849] display-ptlh-1 kernel: Hardware name: Intel Corporation Panther Lake Client Platform/PTL-UH LP5 T3 RVP1, BIOS PTLPFWI1.R00.2454.D00.2411071130 11/07/2024
> [   99.320731] display-ptlh-1 kernel: RIP: 0010:string+0x4d/0xe0
> [   99.324541] display-ptlh-1 kernel: Code: ff 77 3c 45 89 d1 31 f6 49 01 f9 66 45 85 d2 75 19 eb 1e 49 39 f8 76 02 88 07 48 83 c7 01 83 c6 01 48 83 c2 01 4c 39 cf 74 07 <0f> b6 02 84 c0 75 e2 4c 89 c2 e8 f4 eb ff ff 5d c3 cc cc cc cc 48
> [   99.343456] display-ptlh-1 kernel: RSP: 0018:ffffc90012fcf930 EFLAGS: 00010286
> [   99.348733] display-ptlh-1 kernel: RAX: afafafafafaf9faf RBX: ffffc90012fcf9a8 RCX: ffff0a00ffffff04
> [   99.355937] display-ptlh-1 kernel: RDX: afafafafafafafaf RSI: 0000000000000000 RDI: ffff888111829243
> [   99.363136] display-ptlh-1 kernel: RBP: ffffc90012fcf930 R08: ffff888111829a1c R09: ffff888211829242
> [   99.370339] display-ptlh-1 kernel: R10: ffffffffffffffff R11: 0000000000000000 R12: ffff888111829a1c
> [   99.377542] display-ptlh-1 kernel: R13: ffffffff82f68964 R14: ffffffff82f68964 R15: ffff888111829243
> [   99.384743] display-ptlh-1 kernel: FS:  00007f973d83b8c0(0000) GS:ffff88844b980000(0000) knlGS:0000000000000000
> [   99.392904] display-ptlh-1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   99.398708] display-ptlh-1 kernel: CR2: 00005620a7387a18 CR3: 000000012bec4006 CR4: 0000000000772ef0
> [   99.405913] display-ptlh-1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   99.413113] display-ptlh-1 kernel: DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
> [   99.420318] display-ptlh-1 kernel: PKRU: 55555554
> [   99.423062] display-ptlh-1 kernel: Call Trace:
> [   99.425543] display-ptlh-1 kernel:  <TASK>
> [   99.427675] display-ptlh-1 kernel:  ? show_regs+0x69/0x80
> [   99.431135] display-ptlh-1 kernel:  ? die_addr+0x38/0x90
> [   99.434488] display-ptlh-1 kernel:  ? exc_general_protection+0x1d4/0x440
> [   99.439242] display-ptlh-1 kernel:  ? asm_exc_general_protection+0x27/0x30
> [   99.444186] display-ptlh-1 kernel:  ? string+0x4d/0xe0
> [   99.447367] display-ptlh-1 kernel:  vsnprintf+0x23e/0x560
> [   99.450815] display-ptlh-1 kernel:  add_uevent_var+0x96/0x190
> [   99.454610] display-ptlh-1 kernel:  ? string+0x5c/0xe0
> [   99.457790] display-ptlh-1 kernel:  power_supply_uevent+0x5a/0x200
> [   99.462025] display-ptlh-1 kernel:  dev_uevent+0x106/0x2e0
> [   99.465555] display-ptlh-1 kernel:  uevent_show+0xac/0x140
> [   99.469082] display-ptlh-1 kernel:  dev_attr_show+0x1a/0x60
> [   99.472701] display-ptlh-1 kernel:  sysfs_kf_seq_show+0xaa/0x140
> [   99.476758] display-ptlh-1 kernel:  kernfs_seq_show+0x3f/0x50
> [   99.480548] display-ptlh-1 kernel:  seq_read_iter+0x125/0x4e0
> [   99.484342] display-ptlh-1 kernel:  kernfs_fop_read_iter+0x170/0x200
> [   99.488748] display-ptlh-1 kernel:  vfs_read+0x260/0x350
> [   99.492106] display-ptlh-1 kernel:  ksys_read+0x70/0xf0
> [   99.495372] display-ptlh-1 kernel:  __x64_sys_read+0x19/0x20
> [   99.499076] display-ptlh-1 kernel:  x64_sys_call+0x1b85/0x2140
> [   99.502954] display-ptlh-1 kernel:  do_syscall_64+0x87/0x140
> [   99.506656] display-ptlh-1 kernel:  ? trace_irq_disable+0x6d/0xa0
> [   99.510799] display-ptlh-1 kernel:  ? trace_irq_enable+0x6d/0xa0
> [   99.514853] display-ptlh-1 kernel:  ? syscall_exit_to_user_mode+0xcc/0x200
> [   99.519779] display-ptlh-1 kernel:  ? do_syscall_64+0x93/0x140
> [   99.523659] display-ptlh-1 kernel:  ? __fput+0x1c6/0x2f0
> [   99.527014] display-ptlh-1 kernel:  ? trace_irq_disable+0x6d/0xa0
> [   99.531155] display-ptlh-1 kernel:  ? trace_irq_enable+0x6d/0xa0
> [   99.535211] display-ptlh-1 kernel:  ? syscall_exit_to_user_mode+0xcc/0x200
> [   99.540138] display-ptlh-1 kernel:  ? do_syscall_64+0x93/0x140
> [   99.544019] display-ptlh-1 kernel:  ? trace_irq_enable+0x6d/0xa0
> [   99.548078] display-ptlh-1 kernel:  ? syscall_exit_to_user_mode+0xcc/0x200
> [   99.553004] display-ptlh-1 kernel:  ? do_syscall_64+0x93/0x140
> [   99.556882] display-ptlh-1 kernel:  ? syscall_exit_to_user_mode+0xcc/0x200
> [   99.561809] display-ptlh-1 kernel:  ? do_syscall_64+0x93/0x140
> [   99.565691] display-ptlh-1 kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   99.570795] display-ptlh-1 kernel: RIP: 0033:0x7f973d71ba61
> [   99.574413] display-ptlh-1 kernel: Code: 00 48 8b 15 b9 73 0e 00 f7 d8 64 89 02 b8 ff ff ff ff eb bd e8 40 c4 01 00 f3 0f 1e fa 80 3d e5 f5 0e 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 4f c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec
> [   99.593325] display-ptlh-1 kernel: RSP: 002b:00007ffce625b508 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> [   99.600962] display-ptlh-1 kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f973d71ba61
> [   99.608165] display-ptlh-1 kernel: RDX: 0000000000001008 RSI: 00005620a7386a70 RDI: 0000000000000003
> [   99.615366] display-ptlh-1 kernel: RBP: 00007ffce625b610 R08: 00007f973d803b20 R09: 0000000000000000
> [   99.622571] display-ptlh-1 kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000001008
> [   99.629774] display-ptlh-1 kernel: R13: ffffffffffffffff R14: 0000000000001008 R15: 00005620a7386a70
> [   99.636982] display-ptlh-1 kernel:  </TASK>
> [   99.639198] display-ptlh-1 kernel: Modules linked in: snd_sof_pci_intel_ptl snd_sof_pci_intel_lnl snd_sof_pci_intel_mtl snd_sof_intel_hda_generic snd_sof_pci snd_sof_xtensa_dsp snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi snd_soc_acpi_intel_sdca_quirks snd_intel_dspcfg snd_hda_codec snd_hwdep snd_sof_intel_hda_mlink snd_hda_ext_core snd_hda_core snd_soc_sdca x86_pkg_temp_thermal intel_powerclamp coretemp snd_soc_core snd_compress kvm_intel snd_pcm kvm crct10dif_pclmul crc32_pclmul polyval_clmulni snd_seq polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 snd_seq_device snd_timer sha1_ssse3 cdc_ether snd aesni_intel usbnet wmi_bmof crypto_simd mii cryptd e1000e i2c_i801 soundcore i2c_smbus idma64 thunderbolt ucsi_acpi typec_ucsi igen6_edac typec binfmt_misc video ov13b10 v4l2_fwnode v4l2_async videodev mc intel_skl_int3472_tps68470 wmi tps68470_regulator intel_pmc_core clk_tps68470 acpi_tad nls_iso8859_1 intel_vsec intel_skl_int3472_discrete pmt_telemetry
> [   99.639250] display-ptlh-1 kernel:  intel_skl_int3472_common acpi_pad pmt_class input_leds mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore drm nfnetlink ip_tables x_tables autofs4
> [   99.747167] display-ptlh-1 kernel: ---[ end trace 0000000000000000 ]---
> [   99.974871] display-ptlh-1 kernel: RIP: 0010:string+0x4d/0xe0
> [   99.978693] display-ptlh-1 kernel: Code: ff 77 3c 45 89 d1 31 f6 49 01 f9 66 45 85 d2 75 19 eb 1e 49 39 f8 76 02 88 07 48 83 c7 01 83 c6 01 48 83 c2 01 4c 39 cf 74 07 <0f> b6 02 84 c0 75 e2 4c 89 c2 e8 f4 eb ff ff 5d c3 cc cc cc cc 48
> [   99.997606] display-ptlh-1 kernel: RSP: 0018:ffffc90012fcf930 EFLAGS: 00010286
> [  100.002891] display-ptlh-1 kernel: RAX: afafafafafaf9faf RBX: ffffc90012fcf9a8 RCX: ffff0a00ffffff04
> [  100.010099] display-ptlh-1 kernel: RDX: afafafafafafafaf RSI: 0000000000000000 RDI: ffff888111829243
> [  100.017302] display-ptlh-1 kernel: RBP: ffffc90012fcf930 R08: ffff888111829a1c R09: ffff888211829242
> [  100.024502] display-ptlh-1 kernel: R10: ffffffffffffffff R11: 0000000000000000 R12: ffff888111829a1c
> [  100.031701] display-ptlh-1 kernel: R13: ffffffff82f68964 R14: ffffffff82f68964 R15: ffff888111829243
> [  100.038900] display-ptlh-1 kernel: FS:  00007f973d83b8c0(0000) GS:ffff88844b980000(0000) knlGS:0000000000000000
> [  100.047062] display-ptlh-1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  100.052866] display-ptlh-1 kernel: CR2: 00005620a7387a18 CR3: 000000012bec4006 CR4: 0000000000772ef0
> [  100.060066] display-ptlh-1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  100.067265] display-ptlh-1 kernel: DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
> [  100.074473] display-ptlh-1 kernel: PKRU: 55555554
> 
> --
> Cheers,
> Luca.
> 


More information about the Intel-gfx mailing list