Panic with linus/master and panfrost

Rob Clark robdclark at gmail.com
Mon Nov 15 23:04:13 UTC 2021


On Mon, Nov 15, 2021 at 2:43 PM Rob Clark <robdclark at gmail.com> wrote:
>
> On Mon, Nov 15, 2021 at 8:16 AM Ondřej Jirman <megi at xff.cz> wrote:
> >
> > On Mon, Nov 15, 2021 at 05:04:36PM +0100, megi xff wrote:
> > > On Mon, Nov 15, 2021 at 04:05:02PM +0100, Daniel Vetter wrote:
> > > > You need
> > > >
> > > > commit 13e9e30cafea10dff6bc8d63a38a61249e83fd65
> > > > Author: Christian König <christian.koenig at amd.com>
> > > > Date:   Mon Oct 18 21:27:55 2021 +0200
> > > >
> > > >    drm/scheduler: fix drm_sched_job_add_implicit_dependencies
> > >
> > > Thank you, that fixed the panic. :)
> >
> > I spoke too soon. Panic is gone, but I still see (immediately after
> > starting Xorg):
> >
> > [   13.290795] ------------[ cut here ]------------
> > [   13.291103] refcount_t: addition on 0; use-after-free.
> > [   13.291495] WARNING: CPU: 5 PID: 548 at lib/refcount.c:25 refcount_warn_saturate+0x98/0x140
> > [   13.292124] Modules linked in:
> > [   13.292285] CPU: 5 PID: 548 Comm: Xorg Not tainted 5.16.0-rc1-00414-g21a254904a26 #29
> > [   13.292857] Hardware name: Pine64 PinePhonePro (DT)
> > [   13.293172] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [   13.293669] pc : refcount_warn_saturate+0x98/0x140
> > [   13.293977] lr : refcount_warn_saturate+0x98/0x140
> > [   13.294285] sp : ffff8000129a3b50
> > [   13.294464] x29: ffff8000129a3b50 x28: ffff8000129a3d50 x27: ffff000017ec4b00
> > [   13.294979] x26: 0000000000000001 x25: 0000000000000001 x24: ffff0000127cca48
> > [   13.295494] x23: ffff000017d19b00 x22: 000000000000000a x21: 0000000000000001
> > [   13.296006] x20: ffff000017e15500 x19: ffff000012980580 x18: 0000000000000003
> > [   13.296520] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000129a3b58
> > [   13.297033] x14: ffffffffffffffff x13: 2e656572662d7265 x12: 7466612d65737520
> > [   13.297546] x11: 3b30206e6f206e6f x10: ffff800011d6e8a0 x9 : ffff80001022f37c
> > [   13.298059] x8 : 00000000ffffefff x7 : ffff800011dc68a0 x6 : 0000000000000001
> > [   13.298573] x5 : 0000000000000000 x4 : ffff0000f77a9788 x3 : ffff0000f77b56f0
> > [   13.299085] x2 : ffff0000f77a9788 x1 : ffff8000e5eb1000 x0 : 000000000000002a
> > [   13.299600] Call trace:
> > [   13.299704]  refcount_warn_saturate+0x98/0x140
> > [   13.299981]  drm_sched_job_add_implicit_dependencies+0x90/0xdc
> > [   13.300385]  panfrost_job_push+0xd0/0x1d4
> > [   13.300628]  panfrost_ioctl_submit+0x34c/0x440
> > [   13.300906]  drm_ioctl_kernel+0x9c/0x154
> > [   13.301142]  drm_ioctl+0x1f0/0x410
> > [   13.301330]  __arm64_sys_ioctl+0xb4/0xdc
> > [   13.301566]  invoke_syscall+0x4c/0x110
> > [   13.301787]  el0_svc_common.constprop.0+0x48/0xf0
> > [   13.302090]  do_el0_svc+0x2c/0x90
> > [   13.302271]  el0_svc+0x14/0x50
> > [   13.302431]  el0t_64_sync_handler+0x9c/0x120
> > [   13.302693]  el0t_64_sync+0x158/0x15c
> > [   13.302904] ---[ end trace 8c211e57f89714c8 ]---
> > [   13.303211] ------------[ cut here ]------------
> > [   13.303504] refcount_t: underflow; use-after-free.
> > [   13.303820] WARNING: CPU: 5 PID: 548 at lib/refcount.c:28 refcount_warn_saturate+0xec/0x140
> > [   13.304439] Modules linked in:
> > [   13.304596] CPU: 5 PID: 548 Comm: Xorg Tainted: G        W         5.16.0-rc1-00414-g21a254904a26 #29
> > [   13.305286] Hardware name: Pine64 PinePhonePro (DT)
> > [   13.305600] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [   13.306095] pc : refcount_warn_saturate+0xec/0x140
> > [   13.306402] lr : refcount_warn_saturate+0xec/0x140
> > [   13.306710] sp : ffff8000129a3b70
> > [   13.306887] x29: ffff8000129a3b70 x28: ffff8000129a3d50 x27: ffff000017ec4b00
> > [   13.307401] x26: 0000000000000001 x25: 0000000000000001 x24: 0000000000000000
> > [   13.307914] x23: 00000000ffffffff x22: ffff0000129807c0 x21: ffff000012980580
> > [   13.308428] x20: ffff000017c54d00 x19: 0000000000000000 x18: 0000000000000003
> > [   13.308942] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000129a3b58
> > [   13.309454] x14: ffffffffffffffff x13: 2e656572662d7265 x12: 7466612d65737520
> > [   13.309967] x11: 3b776f6c66726564 x10: ffff800011d6e8a0 x9 : ffff80001017893c
> > [   13.310480] x8 : 00000000ffffefff x7 : ffff800011dc68a0 x6 : 0000000000000001
> > [   13.310993] x5 : ffff0000f77a9788 x4 : 0000000000000000 x3 : 0000000000000027
> > [   13.311506] x2 : 0000000000000023 x1 : ffff0000f77a9790 x0 : 0000000000000026
> > [   13.312020] Call trace:
> > [   13.312123]  refcount_warn_saturate+0xec/0x140
> > [   13.312401]  dma_resv_add_excl_fence+0x1a8/0x1bc
> > [   13.312700]  panfrost_job_push+0x174/0x1d4
> > [   13.312949]  panfrost_ioctl_submit+0x34c/0x440
> > [   13.313229]  drm_ioctl_kernel+0x9c/0x154
> > [   13.313464]  drm_ioctl+0x1f0/0x410
> > [   13.313651]  __arm64_sys_ioctl+0xb4/0xdc
> > [   13.313884]  invoke_syscall+0x4c/0x110
> > [   13.314103]  el0_svc_common.constprop.0+0x48/0xf0
> > [   13.314405]  do_el0_svc+0x2c/0x90
> > [   13.314586]  el0_svc+0x14/0x50
> > [   13.314745]  el0t_64_sync_handler+0x9c/0x120
> > [   13.315007]  el0t_64_sync+0x158/0x15c
> > [   13.315217] ---[ end trace 8c211e57f89714c9 ]---
> >
> > In dmesg. So this looks like some independent issue.
> >
>
>
> I'm seeing something similar with drm/msm, which is, I think, due to
> the introduction and location of call to drm_sched_job_arm().. I'm
> still trying to untangle where it should go, but I think undoing
> 357285a2d1c0 ("drm/msm: Improve drm/sched point of no return rules")
> would fix it

ok, disregard that above.. what actually seems to have fixed it for me is:

------------
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index 94fe51b3caa2..f91fb31ab7a7 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -704,12 +704,13 @@ int
drm_sched_job_add_implicit_dependencies(struct drm_sched_job *job,
        int ret;

        dma_resv_for_each_fence(&cursor, obj->resv, write, fence) {
-               ret = drm_sched_job_add_dependency(job, fence);
-               if (ret)
-                       return ret;
-
                /* Make sure to grab an additional ref on the added fence */
                dma_fence_get(fence);
+               ret = drm_sched_job_add_dependency(job, fence);
+               if (ret) {
+                       dma_fence_put(fence);
+                       return ret;
+               }
        }
        return 0;
 }
------------

The problem looks like that drm_sched_job_add_dependencies() was
dropping the last ref before the dma_fence_get()..

Not sure if I should send a patch or if this can be squashed into the
existing fix?

BR,
-R


More information about the dri-devel mailing list