<html><body><p>
<pre>
Hi, Sasha:

On Mon, 2024-11-25 at 01:35 +0100, Javier Carrasco wrote:
> External email : Please do not click links or open attachments until you have verified the sender or the content.
>
>
> On 24/11/2024 23:58, Dave Airlie wrote:
> > On Mon, 25 Nov 2024 at 02:41, Sasha Levin <sashal@kernel.org> wrote:
> > >
> > > On Thu, Nov 21, 2024 at 10:25:45AM +1000, Dave Airlie wrote:
> > > > Hi Linus,
> > > >
> > > > This is the main drm pull request for 6.13.
> > > >
> > > > I've done a test merge into your tree, there were two conflicts both
> > > > of which seem easy enough to resolve for you.
> > > >
> > > > There's a lot of rework, the panic helper support is being added to
> > > > more drivers, v3d gets support for HW superpages, scheduler
> > > > documentation, drm client and video aperture reworks, some new
> > > > MAINTAINERS added, amdgpu has the usual lots of IP refactors, Intel
> > > > has some Pantherlake enablement and xe is getting some SRIOV bits, but
> > > > just lots of stuff everywhere.
> > > >
> > > > Let me know if there are any issues,
> > >
> > > Hey Dave,
> > >
> > > After the PR was merged, I've started seeing boot failures reported by
> > > KernelCI:
> >
> > I'll add the mediatek names I see who touched anything in the area recently.
> >
> > Dave.
> > >
> > > [ 4.395400] mediatek-drm mediatek-drm.5.auto: bound 1c014000.merge (ops 0xffffd35fd12975f8)
> > > [ 4.396155] mediatek-drm mediatek-drm.5.auto: bound 1c000000.ovl (ops 0xffffd35fd12977b8)
> > > [ 4.411951] mediatek-drm mediatek-drm.5.auto: bound 1c002000.rdma (ops 0xffffd35fd12989c0)
> > > [ 4.536837] mediatek-drm mediatek-drm.5.auto: bound 1c004000.ccorr (ops 0xffffd35fd1296cf0)
> > > [ 4.545181] mediatek-drm mediatek-drm.5.auto: bound 1c005000.aal (ops 0xffffd35fd1296a80)
> > > [ 4.553344] mediatek-drm mediatek-drm.5.auto: bound 1c006000.gamma (ops 0xffffd35fd12972b0)
> > > [ 4.561680] mediatek-drm mediatek-drm.5.auto: bound 1c014000.merge (ops 0xffffd35fd12975f8)
> > > [ 4.570025] ------------[ cut here ]------------
> > > [ 4.574630] refcount_t: underflow; use-after-free.
> > > [ 4.579416] WARNING: CPU: 6 PID: 81 at lib/refcount.c:28 refcount_warn_saturate+0xf4/0x148
> > > [ 4.587670] Modules linked in:
> > > [ 4.590714] CPU: 6 UID: 0 PID: 81 Comm: kworker/u32:3 Tainted: G W 6.12.0 #1 cab58e2e59020ebd4be8ada89a65f465a316c742
> > > [ 4.602695] Tainted: [W]=WARN
> > > [ 4.605649] Hardware name: Acer Tomato (rev2) board (DT)
> > > [ 4.610947] Workqueue: events_unbound deferred_probe_work_func
> > > [ 4.616768] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > [ 4.623715] pc : refcount_warn_saturate+0xf4/0x148
> > > [ 4.628493] lr : refcount_warn_saturate+0xf4/0x148
> > > [ 4.633270] sp : ffff8000807639c0
> > > [ 4.636571] x29: ffff8000807639c0 x28: ffff34ff4116c640 x27: ffff34ff4368e080
> > > [ 4.643693] x26: ffffd35fd1299ac8 x25: ffff34ff46c8c410 x24: 0000000000000000
> > > [ 4.650814] x23: ffff34ff4368e080 x22: 00000000fffffdfb x21: 0000000000000002
> > > [ 4.657934] x20: ffff34ff470c6000 x19: ffff34ff410c7c10 x18: 0000000000000006
> > > [ 4.665055] x17: 666678302073706f x16: 2820656772656d2e x15: ffff800080763440
> > > [ 4.672176] x14: 0000000000000000 x13: 2e656572662d7265 x12: ffffd35fd2ed14f0
> > > [ 4.679297] x11: 0000000000000001 x10: 0000000000000001 x9 : ffffd35fd0342150
> > > [ 4.686418] x8 : c0000000ffffdfff x7 : ffffd35fd2e21450 x6 : 00000000000affa8
> > > [ 4.693539] x5 : ffffd35fd2ed1498 x4 : 0000000000000000 x3 : 0000000000000000
> > > [ 4.700660] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff34ff40932580
> > > [ 4.707781] Call trace:
> > > [ 4.710216] refcount_warn_saturate+0xf4/0x148 (P)
> > > [ 4.714993] refcount_warn_saturate+0xf4/0x148 (L)
> > > [ 4.719772] kobject_put+0x110/0x118
> > > [ 4.723335] put_device+0x1c/0x38
> > > [ 4.726638] mtk_drm_bind+0x294/0x5c0
> > > [ 4.730289] try_to_bring_up_aggregate_device+0x16c/0x1e0
> > > [ 4.735673] __component_add+0xbc/0x1c0
> > > [ 4.739495] component_add+0x1c/0x30
> > > [ 4.743058] mtk_disp_rdma_probe+0x140/0x210
> > > [ 4.747314] platform_probe+0x70/0xd0
> > > [ 4.750964] really_probe+0xc4/0x2a8
> > > [ 4.754527] __driver_probe_device+0x80/0x140
> > > [ 4.758870] driver_probe_device+0x44/0x120
> > > [ 4.763040] __device_attach_driver+0xc0/0x108
> > > [ 4.767470] bus_for_each_drv+0x8c/0xf0
> > > [ 4.771294] __device_attach+0xa4/0x198
> > > [ 4.775117] device_initial_probe+0x1c/0x30
> > > [ 4.779286] bus_probe_device+0xb4/0xc0
> > > [ 4.783109] deferred_probe_work_func+0xb0/0x100
> > > [ 4.787714] process_one_work+0x18c/0x420
> > > [ 4.791712] worker_thread+0x30c/0x418
> > > [ 4.795449] kthread+0x128/0x138
> > > [ 4.798665] ret_from_fork+0x10/0x20
> > > [ 4.802229] ---[ end trace 0000000000000000 ]---
> > >
> > > I don't think that I'll be able to bisect further as I don't have the
> > > relevant hardware available.
> > >
> > > --
> > > Thanks,
> > > Sasha
>
>
> Hello, I am one of those who touched something in the area.
>
> To check if my changes are the cause of the boot failures, please apply
> this patch:
>
> diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
> b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
> index 9a8ef8558da9..85be035a209a 100644
> --- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
> +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
> @@ -373,11 +373,12 @@ static bool mtk_drm_get_all_drm_priv(struct device
> *dev)
> struct mtk_drm_private *temp_drm_priv;
> struct device_node *phandle = dev->parent->of_node;
> const struct of_device_id *of_id;
> + struct device_node *node;
> struct device *drm_dev;
> unsigned int cnt = 0;
> int i, j;
>
> - for_each_child_of_node_scoped(phandle->parent, node) {
> + for_each_child_of_node(phandle->parent, node) {
> struct platform_device *pdev;
>
> of_id = of_match_node(mtk_drm_of_ids, node);
>

Does Javier's patch fix the problem?

Regards,
CK

>
> ---
>
>
> This chunk can be found in mtk_drm_get_all_drm_priv(), which is not
> listed in the trace, but it is called from mtk_drm_bind().
>
> The loop did not release the child_node if cnt == MAX_CRTC (by means of
> a break), which goes against how for_each_child_of_node() should be
> handled. If the child_node is indeed required afterwards (it is not
> referenced anywhere after the loop), it should be acquired via
> of_node_get() and stored somewhere to be able to put it later.
>
> Then another issue would lie underneath as the reference to the
> child_node is not stored in any way. But if this patch fixes the issue,
> then I suppose it should be applied immediately, and the rest should be
> discussed later on.
>
> By the way, are there any logs with debug/error messages to analyze
> further is the issue is something different?
>
> Thanks and best regards,
> Javier Carrasco


</pre>
</p></body></html><!--type:text--><!--{--><pre>************* MEDIATEK Confidentiality Notice ********************
The information contained in this e-mail message (including any 
attachments) may be confidential, proprietary, privileged, or otherwise
exempt from disclosure under applicable laws. It is intended to be 
conveyed only to the designated recipient(s). Any use, dissemination, 
distribution, printing, retaining or copying of this e-mail (including its 
attachments) by unintended recipient(s) is strictly prohibited and may 
be unlawful. If you are not an intended recipient of this e-mail, or believe 
that you have received this e-mail in error, please notify the sender 
immediately (by replying to this e-mail), delete any and all copies of 
this e-mail (including any attachments) from your system, and do not
disclose the content of this e-mail to any other person. Thank you!
</pre><!--}-->