[PATCH v2] gpu/drm/msm: fix shutdown hook in case GPU components failed to bind
Stephen Boyd
swboyd at chromium.org
Thu Mar 25 04:39:11 UTC 2021
Quoting Rob Clark (2021-03-24 20:09:37)
> On Wed, Mar 24, 2021 at 6:49 PM Stephen Boyd <swboyd at chromium.org> wrote:
> >
> > Quoting Dmitry Baryshkov (2021-03-18 13:05:44)
> > > if GPU components have failed to bind, shutdown callback would fail with
> > > the following backtrace. Add safeguard check to stop that oops from
> > > happening and allow the board to reboot.
> > [...]
> > > diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> > > index 94525ac76d4e..fd2ac54caf9f 100644
> > > --- a/drivers/gpu/drm/msm/msm_drv.c
> > > +++ b/drivers/gpu/drm/msm/msm_drv.c
> > > @@ -1311,6 +1311,10 @@ static int msm_pdev_remove(struct platform_device *pdev)
> > > static void msm_pdev_shutdown(struct platform_device *pdev)
> > > {
> > > struct drm_device *drm = platform_get_drvdata(pdev);
> > > + struct msm_drm_private *priv = drm ? drm->dev_private : NULL;
> > > +
> > > + if (!priv || !priv->kms)
> > > + return;
> > >
> >
> > I see a problem where if I don't get a backlight probing then my
> > graphics card doesn't appear but this driver is still bound. I was
> > hoping this patch would fix it but it doesn't. I have slab poisoning
> > enabled so sometimes the 'priv' pointer is 0x6b6b6b6b6b6b6b6b meaning it
> > got all freed.
> >
> > I found that the 'drm' pointer here is pointing at junk. The
> > msm_drm_init() function calls drm_dev_put() on the error path and that
> > will destroy the drm pointer but it doesn't update this platform drivers
> > drvdata. Do we need another patch that sets the drvdata to NULL on
> > msm_drm_init() failing? One last note, I'm seeing this on 5.4 so maybe I
> > missed something and the drvdata has been set to NULL somewhere else
> > upstream. I sort of doubt it though.
>
> the hw that I guess you are running on should work pretty well w/
> upstream kernel.. but I don't think there is any important delta
> between upstream and the 5.4 based kernel that you are running that
> would fix this..
>
> so *probably* you are right..
linux-next is failing like this today for me on Lazor right after the
screen turns on. I'll have to figure out what's wrong before checking
upstream.
[ 10.734752] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000080
[ 10.744482] Mem abort info:
[ 10.747462] ESR = 0x96000006
[ 10.750644] EC = 0x25: DABT (current EL), IL = 32 bits
[ 10.756125] SET = 0, FnV = 0
[ 10.759290] EA = 0, S1PTW = 0
[ 10.762543] Data abort info:
[ 10.765519] ISV = 0, ISS = 0x00000006
[ 10.769485] CM = 0, WnR = 0
[ 10.772553] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000123474000
[ 10.779212] [0000000000000080] pgd=0800000123475003, p4d=0800000123475003, pud=0800000123475003, pmd=0000000000000000
[ 10.790128] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 10.795856] Modules linked in: ath10k_snoc qmi_helpers ath10k_core ath mac80211 cfg80211 r8152 mii joydev
[ 10.805705] CPU: 5 PID: 1576 Comm: DrmThread Not tainted 5.12.0-rc4-next-20210324+ #13
[ 10.813832] Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
[ 10.820535] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
[ 10.826703] pc : dpu_plane_atomic_update+0x80/0xcb8
[ 10.831730] lr : dpu_plane_restore+0x5c/0x88
[ 10.836117] sp : ffffffc012963920
[ 10.839521] x29: ffffffc0129639c0 x28: ffffffed5c9ad000
[ 10.844979] x27: ffffffed5c736000 x26: ffffffed5ca3f000
[ 10.850443] x25: ffffffed5c736000 x24: 0000000000000000
[ 10.855903] x23: 0000000000000000 x22: ffffff80ad007400
[ 10.861361] x21: ffffff8085193808 x20: 0000000000000000
[ 10.866818] x19: ffffff8085193800 x18: 0000000000000008
[ 10.872274] x17: 0000000000800000 x16: 0000000020000000
[ 10.877738] x15: 0000000000000001 x14: 0000000000000000
[ 10.883201] x13: ffffff80852324a8 x12: 0000000000000008
[ 10.888657] x11: ffffffed5c3b7890 x10: 0000000000000000
[ 10.894112] x9 : 0000000000000000 x8 : 0000000000000000
[ 10.899570] x7 : 0000000000004000 x6 : 0000000000010000
[ 10.905026] x5 : 0000000000040000 x4 : 0000000000000800
[ 10.910482] x3 : 0000000000000000 x2 : 0000000000020041
[ 10.915946] x1 : ffffff80ad2e2600 x0 : ffffff8085193800
[ 10.921402] Call trace:
[ 10.923923] dpu_plane_atomic_update+0x80/0xcb8
[ 10.928585] dpu_plane_restore+0x5c/0x88
[ 10.932620] dpu_crtc_atomic_flush+0xd4/0x1a0
[ 10.937105] drm_atomic_helper_commit_planes+0x1b4/0x1e0
[ 10.942565] msm_atomic_commit_tail+0x2d4/0x670
[ 10.947223] commit_tail+0xac/0x148
[ 10.950814] drm_atomic_helper_commit+0x104/0x10c
[ 10.955653] drm_atomic_commit+0x58/0x68
[ 10.959686] drm_mode_atomic_ioctl+0x438/0x51c
[ 10.964261] drm_ioctl_kernel+0xa8/0x124
[ 10.968295] drm_ioctl+0x24c/0x3ec
[ 10.971800] drm_compat_ioctl+0xe0/0xf4
[ 10.975745] __arm64_compat_sys_ioctl+0xcc/0x104
[ 10.980499] el0_svc_common+0xa4/0x128
[ 10.984358] do_el0_svc_compat+0x2c/0x38
[ 10.988395] el0_svc_compat+0x20/0x30
[ 10.992164] el0_sync_compat_handler+0xc0/0xf0
[ 10.996734] el0_sync_compat+0x174/0x180
[ 11.000774] Code: d0003d61 91204821 52800020 97fe8c65 (39420288)
More information about the dri-devel
mailing list