Regression on linux-next (next-20241120) and drm-tip
Rafael J. Wysocki
rafael at kernel.org
Tue Dec 3 14:33:21 UTC 2024
On Tue, Dec 3, 2024 at 1:04 PM Thomas Weißschuh <linux at weissschuh.net> wrote:
>
> On 2024-12-03 12:54:54+0100, Rafael J. Wysocki wrote:
> > On Tue, Dec 3, 2024 at 7:51 AM Thomas Weißschuh <linux at weissschuh.net> wrote:
> > >
> > > (+Cc Sebastian)
> > >
> > > Hi Chaitanya,
> > >
> > > On 2024-12-03 05:07:47+0000, Borah, Chaitanya Kumar wrote:
> > > > Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
> > > >
> > > > This mail is regarding a regression we are seeing in our CI runs[1] on linux-next repository.
> > >
> > > Thanks for the report.
> > >
> > > > Since the version next-20241120 [2], we are seeing the following regression
> > > >
> > > > `````````````````````````````````````````````````````````````````````````````````
> > > > <4>[ 19.990743] Oops: general protection fault, probably for non-canonical address 0xb11675ef8d1ccbce: 0000 [#1] PREEMPT SMP NOPTI
> > > > <4>[ 19.990760] CPU: 21 UID: 110 PID: 867 Comm: prometheus-node Not tainted 6.12.0-next-20241120-next-20241120-gac24e26aa08f+ #1
> > > > <4>[ 19.990771] Hardware name: Intel Corporation Arrow Lake Client Platform/MTL-S UDIMM 2DPC EVCRB, BIOS MTLSFWI1.R00.4400.D85.2410100007 10/10/2024
> > > > <4>[ 19.990782] RIP: 0010:power_supply_get_property+0x3e/0xe0
> > > > `````````````````````````````````````````````````````````````````````````````````
> > > > Details log can be found in [3].
> > > >
> > > > After bisecting the tree, the following patch [4] seems to be the first "bad"
> > > > commit
> > > >
> > > > `````````````````````````````````````````````````````````````````````````````````````````````````````````
> > > > Commit 49000fee9e639f62ba1f965ed2ae4c5ad18d19e2
> > > > Author: Thomas Weißschuh <mailto:linux at weissschuh.net>
> > > > AuthorDate: Sat Oct 5 12:05:03 2024 +0200
> > > > Commit: Sebastian Reichel <mailto:sebastian.reichel at collabora.com>
> > > > CommitDate: Tue Oct 15 22:22:20 2024 +0200
> > > > power: supply: core: add wakeup source inhibit by power_supply_config
> > > > `````````````````````````````````````````````````````````````````````````````````````````````````````````
> > > >
> > > > This is now seen in our drm-tip runs as well. [5]
> > > >
> > > > Could you please check why the patch causes this regression and provide a fix if necessary?
> > >
> > > I don't see how this patch can lead to this error.
> >
> > It looks like the cfg->no_wakeup_source access reaches beyond the
> > struct boundary for some reason.
>
> But the access to this field is only done in __power_supply_register().
> The error reports however don't show this function at all,
> they come from power_supply_uevent() and power_supply_get_property() by
> which time the call to __power_supply_register() is long over.
>
> FWIW there is an uninitialized 'struct power_supply_config' in
> drivers/hid/hid-corsair-void.c. But I highly doubt the test machines are
> using that. (I'll send a patch later for it)
So the only way I can think about in which the commit in question may
lead to the reported issues is that changing the size of struct
power_supply_config or its alignment makes an unexpected functional
difference somewhere.
AFAICS, this commit cannot be reverted by itself, so which commits on
top of it need to be reverted in order to revert it cleanly?
More information about the Intel-gfx
mailing list