<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
+Rodrigo</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> on behalf of Saarinen, Jani <jani.saarinen@intel.com><br>
<b>Sent:</b> Tuesday, December 3, 2024 8:22 AM<br>
<b>To:</b> Thomas Weißschuh <linux@weissschuh.net>; Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com>; Krogerus, Heikki <heikki.krogerus@intel.com><br>
<b>Cc:</b> Rafael J. Wysocki <rafael@kernel.org>; Kurmi, Suresh Kumar <suresh.kumar.kurmi@intel.com>; Coelho, Luciano <luciano.coelho@intel.com>; Nikula, Jani <jani.nikula@intel.com>; De Marchi, Lucas <lucas.demarchi@intel.com>; intel-gfx@lists.freedesktop.org
<intel-gfx@lists.freedesktop.org>; intel-xe@lists.freedesktop.org <intel-xe@lists.freedesktop.org>; linux-pm@vger.kernel.org <linux-pm@vger.kernel.org>; Sebastian Reichel <sebastian.reichel@collabora.com><br>
<b>Subject:</b> RE: Regression on linux-next (next-20241120) and drm-tip</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">+@Krogerus, Heikki <br>
<br>
> -----Original Message-----<br>
> From: Thomas Weißschuh <linux@weissschuh.net><br>
> Sent: Tuesday, 3 December 2024 18.08<br>
> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com><br>
> Cc: Rafael J. Wysocki <rafael@kernel.org>; Kurmi, Suresh Kumar<br>
> <suresh.kumar.kurmi@intel.com>; Coelho, Luciano <luciano.coelho@intel.com>;<br>
> Saarinen, Jani <jani.saarinen@intel.com>; Nikula, Jani <jani.nikula@intel.com>;<br>
> De Marchi, Lucas <lucas.demarchi@intel.com>; intel-gfx@lists.freedesktop.org;<br>
> intel-xe@lists.freedesktop.org; linux-pm@vger.kernel.org; Sebastian Reichel<br>
> <sebastian.reichel@collabora.com><br>
> Subject: Re: Regression on linux-next (next-20241120) and drm-tip<br>
> <br>
> On 2024-12-03 16:57:21+0100, Thomas Weißschuh wrote:<br>
> > On 2024-12-03 15:42:23+0000, Borah, Chaitanya Kumar wrote:<br>
> > ><br>
> > ><br>
> > > > -----Original Message-----<br>
> > > > From: Thomas Weißschuh <linux@weissschuh.net><br>
> > > > Sent: Tuesday, December 3, 2024 8:20 PM<br>
> > > > To: Rafael J. Wysocki <rafael@kernel.org><br>
> > > > Cc: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com>;<br>
> > > > Kurmi, Suresh Kumar <suresh.kumar.kurmi@intel.com>; Coelho,<br>
> > > > Luciano <luciano.coelho@intel.com>; Saarinen, Jani<br>
> > > > <jani.saarinen@intel.com>; Nikula, Jani <jani.nikula@intel.com>;<br>
> > > > De Marchi, Lucas <lucas.demarchi@intel.com>;<br>
> > > > intel-gfx@lists.freedesktop.org; intel- xe@lists.freedesktop.org;<br>
> > > > linux-pm@vger.kernel.org; Sebastian Reichel<br>
> > > > <sebastian.reichel@collabora.com><br>
> > > > Subject: Re: Regression on linux-next (next-20241120) and drm-tip<br>
> > > ><br>
> > > > On 2024-12-03 15:33:21+0100, Rafael J. Wysocki wrote:<br>
> > > > > On Tue, Dec 3, 2024 at 1:04 PM Thomas Weißschuh<br>
> > > > <linux@weissschuh.net> wrote:<br>
> > > > > ><br>
> > > > > > On 2024-12-03 12:54:54+0100, Rafael J. Wysocki wrote:<br>
> > > > > > > On Tue, Dec 3, 2024 at 7:51 AM Thomas Weißschuh<br>
> > > > <linux@weissschuh.net> wrote:<br>
> > > > > > > ><br>
> > > > > > > > (+Cc Sebastian)<br>
> > > > > > > ><br>
> > > > > > > > Hi Chaitanya,<br>
> > > > > > > ><br>
> > > > > > > > On 2024-12-03 05:07:47+0000, Borah, Chaitanya Kumar wrote:<br>
> > > > > > > > > Hope you are doing well. I am Chaitanya from the linux<br>
> > > > > > > > > graphics team<br>
> > > > in Intel.<br>
> > > > > > > > ><br>
> > > > > > > > > This mail is regarding a regression we are seeing in our<br>
> > > > > > > > > CI runs[1] on<br>
> > > > linux-next repository.<br>
> > > > > > > ><br>
> > > > > > > > Thanks for the report.<br>
> > > > > > > ><br>
> > > > > > > > > Since the version next-20241120 [2], we are seeing the<br>
> > > > > > > > > following regression<br>
> > > > > > > > ><br>
> > > > > > > > > `````````````````````````````````````````````````````````````````````````````````<br>
> > > > > > > > > <4>[ 19.990743] Oops: general protection fault, probably for non-<br>
> > > > canonical address 0xb11675ef8d1ccbce: 0000 [#1] PREEMPT SMP NOPTI<br>
> > > > > > > > > <4>[ 19.990760] CPU: 21 UID: 110 PID: 867 Comm: prometheus-<br>
> > > > node Not tainted 6.12.0-next-20241120-next-20241120-gac24e26aa08f+<br>
> > > > #1<br>
> > > > > > > > > <4>[ 19.990771] Hardware name: Intel Corporation Arrow Lake<br>
> > > > Client Platform/MTL-S UDIMM 2DPC EVCRB, BIOS<br>
> > > > MTLSFWI1.R00.4400.D85.2410100007 10/10/2024<br>
> > > > > > > > > <4>[ 19.990782] RIP:<br>
> 0010:power_supply_get_property+0x3e/0xe0<br>
> > > > > > > > > ````````````````````````````````````````````````````````<br>
> > > > > > > > > `````` ``````````````````` Details log can be found in<br>
> > > > > > > > > [3].<br>
> > > > > > > > ><br>
> > > > > > > > > After bisecting the tree, the following patch [4] seems<br>
> > > > > > > > > to be the first<br>
> > > > "bad"<br>
> > > > > > > > > commit<br>
> > > > > > > > ><br>
> > > > > > > > > ````````````````````````````````````````````````````````<br>
> > > > > > > > > `````` ```````````````````````````````````````````<br>
> > > > > > > > > Commit 49000fee9e639f62ba1f965ed2ae4c5ad18d19e2<br>
> > > > > > > > > Author: Thomas Weißschuh <<a href="mailto:linux@weissschuh.net">mailto:linux@weissschuh.net</a>><br>
> > > > > > > > > AuthorDate: Sat Oct 5 12:05:03 2024 +0200<br>
> > > > > > > > > Commit: Sebastian Reichel<br>
> > > > <<a href="mailto:sebastian.reichel@collabora.com">mailto:sebastian.reichel@collabora.com</a>><br>
> > > > > > > > > CommitDate: Tue Oct 15 22:22:20 2024 +0200<br>
> > > > > > > > > power: supply: core: add wakeup source inhibit by<br>
> > > > > > > > > power_supply_config<br>
> > > > > > > > > ````````````````````````````````````````````````````````<br>
> > > > > > > > > `````` ```````````````````````````````````````````<br>
> > > > > > > > ><br>
> > > > > > > > > This is now seen in our drm-tip runs as well. [5]<br>
> > > > > > > > ><br>
> > > > > > > > > Could you please check why the patch causes this<br>
> > > > > > > > > regression and<br>
> > > > provide a fix if necessary?<br>
> > > > > > > ><br>
> > > > > > > > I don't see how this patch can lead to this error.<br>
> > > > > > ><br>
> > > > > > > It looks like the cfg->no_wakeup_source access reaches<br>
> > > > > > > beyond the struct boundary for some reason.<br>
> > > > > ><br>
> > > > > > But the access to this field is only done in __power_supply_register().<br>
> > > > > > The error reports however don't show this function at all,<br>
> > > > > > they come from power_supply_uevent() and<br>
> > > > > > power_supply_get_property() by which time the call to<br>
> __power_supply_register() is long over.<br>
> > > > > ><br>
> > > > > > FWIW there is an uninitialized 'struct power_supply_config' in<br>
> > > > > > drivers/hid/hid-corsair-void.c. But I highly doubt the test<br>
> > > > > > machines are using that. (I'll send a patch later for it)<br>
> > > > ><br>
> > > > > So the only way I can think about in which the commit in<br>
> > > > > question may lead to the reported issues is that changing the<br>
> > > > > size of struct power_supply_config or its alignment makes an<br>
> > > > > unexpected functional difference somewhere.<br>
> > > ><br>
> > > > Indeed. I'd really like to see this issue reproduced with KASAN.<br>
> > > ><br>
> > > > > AFAICS, this commit cannot be reverted by itself, so which<br>
> > > > > commits on top of it need to be reverted in order to revert it cleanly?<br>
> > > ><br>
> > > > All the other patches from this series:<br>
> > > > <a href="https://lore.kernel.org/lkml/20241005-power-supply-no-wakeup-sourc">
https://lore.kernel.org/lkml/20241005-power-supply-no-wakeup-sourc</a><br>
> > > > e-v1-<br>
> > > > 0-1d62bf9bcb1d@weissschuh.net/<br>
> > > ><br>
> > > > Could you point me to the full boot log in the drm-tip CI?<br>
> > ><br>
> > > Here is the log for drm-tip CI<br>
> > ><br>
> > > <a href="https://intel-gfx-ci.01.org/tree/drm-tip/IGT_8136/bat-arls-5/boot0.t">
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_8136/bat-arls-5/boot0.t</a><br>
> > > xt<br>
> ><br>
> > Thanks!<br>
> ><br>
> > > I carried out another bisect and it points to the following commit<br>
> > ><br>
> > > commit 226ff2e681d006eada59a9693aa1976d4c15a7d4<br>
> > > Author: Heikki Krogerus <heikki.krogerus@linux.intel.com><br>
> > > Date: Wed Nov 6 17:06:05 2024 +0200<br>
> > ><br>
> > > usb: typec: ucsi: Convert connector specific commands to bitmaps<br>
> > ><br>
> > > That allows the fields in those command data structures to<br>
> > > be easily validated. If an unsupported field is accessed, a<br>
> > > warning is generated.<br>
> ><br>
> > Suspicous: The bitmaps introduced in this commit are right before the<br>
> > psy and psy_desc members of 'struct ucsi_connector'.<br>
> > So any out-of-bounds writes into these members would corrupt those<br>
> > fields.<br>
> > A corrupted power_supply_desc would fit both reported stacktraces.<br>
> <br>
> struct ucsi_connector {<br>
> ...<br>
> <br>
> struct typec_capability typec_cap;<br>
> <br>
> /* Cached command responses. */<br>
> DECLARE_BITMAP(cap, UCSI_GET_CONNECTOR_CAPABILITY_SIZE);<br>
> DECLARE_BITMAP(status, UCSI_GET_CONNECTOR_STATUS_SIZE);<br>
> <br>
> DECLARE_BITMAP() takes the size in number of *bits*<br>
> <br>
> struct power_supply *psy;<br>
> struct power_supply_desc psy_desc;<br>
> u32 rdo;<br>
> <br>
> ...<br>
> }<br>
> <br>
> static int ucsi_get_connector_status(struct ucsi_connector *con, bool conn_ack) {<br>
> u64 command = UCSI_GET_CONNECTOR_STATUS |<br>
> UCSI_CONNECTOR_NUMBER(con->num);<br>
> size_t size = min(UCSI_GET_CONNECTOR_STATUS_SIZE,<br>
> UCSI_MAX_DATA_LENGTH(con->ucsi));<br>
> int ret;<br>
> <br>
> ret = ucsi_send_command_common(con->ucsi, command, &con->status,<br>
> size, conn_ack);<br>
> <br>
> ucsi_send_command_common() takes the size in number of *bytes*.<br>
> This call corrupts psy and psy_desc in con.<br>
> <br>
> return ret < 0 ? ret : 0;<br>
> }<br>
> <br>
> ><br>
> > > Reverting it seems to help locally. However, to confirm I have sent out a patch<br>
> to our "try-bot"<br>
> > ><br>
> > > <a href="https://patchwork.freedesktop.org/series/142049/">https://patchwork.freedesktop.org/series/142049/</a><br>
> > ><br>
> > > We can wait for its results.<br>
> > ><br>
> > > Regards<br>
> > ><br>
> > > Chaitanya<br>
> > ><br>
</div>
</span></font></div>
</body>
</html>