[PATCH 11/11] drm/tidss: Annotate dma-fence critical section in commit path
Daniel Vetter
daniel at ffwll.ch
Tue Jun 22 16:31:34 UTC 2021
On Thu, May 27, 2021 at 03:15:34PM +0300, Tomi Valkeinen wrote:
> Hi Daniel,
>
> On 21/01/2021 17:29, Daniel Vetter wrote:
> > Ends right after hw_done(), totally standard case.
> >
> > Acked-by: Jyri Sarha <jsarha at ti.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
> > Cc: Jyri Sarha <jsarha at ti.com>
> > Cc: Tomi Valkeinen <tomi.valkeinen at ti.com>
> > ---
> > drivers/gpu/drm/tidss/tidss_kms.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/tidss/tidss_kms.c b/drivers/gpu/drm/tidss/tidss_kms.c
> > index 09485c7f0d6f..95f8e0f78e32 100644
> > --- a/drivers/gpu/drm/tidss/tidss_kms.c
> > +++ b/drivers/gpu/drm/tidss/tidss_kms.c
> > @@ -4,6 +4,8 @@
> > * Author: Tomi Valkeinen <tomi.valkeinen at ti.com>
> > */
> > +#include <linux/dma-fence.h>
> > +
> > #include <drm/drm_atomic.h>
> > #include <drm/drm_atomic_helper.h>
> > #include <drm/drm_bridge.h>
> > @@ -26,6 +28,7 @@ static void tidss_atomic_commit_tail(struct drm_atomic_state *old_state)
> > {
> > struct drm_device *ddev = old_state->dev;
> > struct tidss_device *tidss = to_tidss(ddev);
> > + bool fence_cookie = dma_fence_begin_signalling();
> > dev_dbg(ddev->dev, "%s\n", __func__);
> > @@ -36,6 +39,7 @@ static void tidss_atomic_commit_tail(struct drm_atomic_state *old_state)
> > drm_atomic_helper_commit_modeset_enables(ddev, old_state);
> > drm_atomic_helper_commit_hw_done(old_state);
> > + dma_fence_end_signalling(fence_cookie);
> > drm_atomic_helper_wait_for_flip_done(ddev, old_state);
> > drm_atomic_helper_cleanup_planes(ddev, old_state);
> >
>
> I bisected v5.13 rc lockdep warnings to this patch. I see this with tidss (lockdep
> report below) and omapdrm (probably caused by "drm/omapdrm: Annotate dma-fence critical
> section in commit path"). The report on omapdrm is very similar, with fs_reclaim and
> dma_fence_map, but hdmi bridge instead of mhdp bridge.
>
> I'm unfamiliar with this piece of drm code, do you have any hints on what the problem
> might be and where I should be looking at?
This entirely fell through cracks.
tldr; Strictly speaking, anything which can hold up a dma_fence from
signalling is not allowed to allocate memory. There's some limitations to
this (GFP_NORECLAIM is ok), but that's the gist. And atomic commits can in
theory return a dma_fence that will signal when it's all done, in practice
it's for Android only.
This is somewhat the reason why I've not pushed this further, it's a very
drastic limitation. Also I've been snowed under with tons of issues around
dma-fence from the i915 side.
Now the issue still is that there's some real deadlocks here, and it would
be nice to annotate them all. Otoh not being able to allocate any memory
(or take locks that in other paths are held while allocating memory, which
is equivalent) is really tough, and definitely not feasible for full
modesets. So the pragmatic fix I think would be to limit the dma_fence
OUT-FENCE for Android to only the page-flip case, similarly limit our
annotations. And if you do an OUT-FENCE which includes a modeset, we just
fall back to a synchronous modeset. The problem can only happen if
userspace (and hence some other kernel thread) can get at the out
dma_fence before we've finished all the work that might need allocations
and what not else.
If you're bored maybe hack this up?
Also I'm kinda disappointed since in the cover letter I asked explicitly
for testing with lockdep, since I somewhat expected issues like this. But
people just blindly throw acks around without looking at the bigger
picture.
Cheers, Daniel
>
> Tomi
>
> [ 20.769286] ======================================================
> [ 20.776918] WARNING: possible circular locking dependency detected
> [ 20.783082] 5.11.0-rc2-00688-g4d56a4f08391-dirty #18 Not tainted
> [ 20.789072] ------------------------------------------------------
> [ 20.795232] kmstest/397 is trying to acquire lock:
> [ 20.800008] ffff800011637878 (fs_reclaim){+.+.}-{0:0}, at: fs_reclaim_acquire+0x40/0xa4
> [ 20.808014]
> [ 20.808014] but task is already holding lock:
> [ 20.813828] ffff000804fb41a8 (&mhdp->link_mutex){+.+.}-{3:3}, at: cdns_mhdp_atomic_enable+0x54/0x2c0 [cdns_mhdp8546]
> [ 20.824343]
> [ 20.824343] which lock already depends on the new lock.
> [ 20.824343]
> [ 20.832497]
> [ 20.832497] the existing dependency chain (in reverse order) is:
> [ 20.839958]
> [ 20.839958] -> #2 (&mhdp->link_mutex){+.+.}-{3:3}:
> [ 20.846214] lock_acquire.part.0+0x178/0x380
> [ 20.850999] lock_acquire+0x6c/0x90
> [ 20.854996] __mutex_lock+0x9c/0x540
> [ 20.859083] mutex_lock_nested+0x44/0x70
> [ 20.863513] cdns_mhdp_atomic_enable+0x54/0x2c0 [cdns_mhdp8546]
> [ 20.869938] drm_atomic_bridge_chain_enable+0x60/0xd4 [drm]
> [ 20.876093] drm_atomic_helper_commit_modeset_enables+0x148/0x260 [drm_kms_helper]
> [ 20.884207] tidss_atomic_commit_tail+0x64/0xd0 [tidss]
> [ 20.889947] commit_tail+0xac/0x190 [drm_kms_helper]
> [ 20.895445] drm_atomic_helper_commit+0x194/0x3a0 [drm_kms_helper]
> [ 20.902155] drm_atomic_commit+0x58/0x70 [drm]
> [ 20.907165] drm_mode_atomic_ioctl+0x9f0/0xbbc [drm]
> [ 20.912693] drm_ioctl_kernel+0xd0/0x124 [drm]
> [ 20.917701] drm_ioctl+0x238/0x460 [drm]
> [ 20.922189] __arm64_sys_ioctl+0xb4/0x100
> [ 20.926707] el0_svc_common.constprop.0+0x80/0x1e0
> [ 20.932007] do_el0_svc+0x30/0xa0
> [ 20.935831] el0_svc+0x20/0x30
> [ 20.939394] el0_sync_handler+0x1a8/0x1b0
> [ 20.943910] el0_sync+0x174/0x180
> [ 20.947734]
> [ 20.947734] -> #1 (dma_fence_map){++++}-{0:0}:
> [ 20.953644] lock_acquire.part.0+0x178/0x380
> [ 20.958422] lock_acquire+0x6c/0x90
> [ 20.962418] __dma_fence_might_wait+0x60/0xdc
> [ 20.967285] dma_resv_lockdep+0x1dc/0x2b4
> [ 20.971803] do_one_initcall+0x90/0x460
> [ 20.976145] kernel_init_freeable+0x2c0/0x32c
> [ 20.981012] kernel_init+0x20/0x128
> [ 20.985007] ret_from_fork+0x10/0x3c
> [ 20.989091]
> [ 20.989091] -> #0 (fs_reclaim){+.+.}-{0:0}:
> [ 20.994740] check_noncircular+0x164/0x180
> [ 20.999344] __lock_acquire+0x13dc/0x1be4
> [ 21.003861] lock_acquire.part.0+0x178/0x380
> [ 21.008639] lock_acquire+0x6c/0x90
> [ 21.012635] fs_reclaim_acquire+0x7c/0xa4
> [ 21.017152] kmem_cache_alloc_trace+0x7c/0x3c0
> [ 21.022103] drm_mode_duplicate+0x34/0x70 [drm]
> [ 21.027199] cdns_mhdp_atomic_enable+0x1c4/0x2c0 [cdns_mhdp8546]
> [ 21.033711] drm_atomic_bridge_chain_enable+0x60/0xd4 [drm]
> [ 21.039845] drm_atomic_helper_commit_modeset_enables+0x148/0x260 [drm_kms_helper]
> [ 21.047946] tidss_atomic_commit_tail+0x64/0xd0 [tidss]
> [ 21.053681] commit_tail+0xac/0x190 [drm_kms_helper]
> [ 21.059179] drm_atomic_helper_commit+0x194/0x3a0 [drm_kms_helper]
> [ 21.065889] drm_atomic_commit+0x58/0x70 [drm]
> [ 21.070897] drm_mode_atomic_ioctl+0x9f0/0xbbc [drm]
> [ 21.076423] drm_ioctl_kernel+0xd0/0x124 [drm]
> [ 21.081430] drm_ioctl+0x238/0x460 [drm]
> [ 21.085917] __arm64_sys_ioctl+0xb4/0x100
> [ 21.090434] el0_svc_common.constprop.0+0x80/0x1e0
> [ 21.095730] do_el0_svc+0x30/0xa0
> [ 21.099554] el0_svc+0x20/0x30
> [ 21.103117] el0_sync_handler+0x1a8/0x1b0
> [ 21.107632] el0_sync+0x174/0x180
> [ 21.111454]
> [ 21.111454] other info that might help us debug this:
> [ 21.111454]
> [ 21.119435] Chain exists of:
> [ 21.119435] fs_reclaim --> dma_fence_map --> &mhdp->link_mutex
> [ 21.119435]
> [ 21.129768] Possible unsafe locking scenario:
> [ 21.129768]
> [ 21.135670] CPU0 CPU1
> [ 21.140184] ---- ----
> [ 21.144698] lock(&mhdp->link_mutex);
> [ 21.148435] lock(dma_fence_map);
> [ 21.154340] lock(&mhdp->link_mutex);
> [ 21.160591] lock(fs_reclaim);
> [ 21.163721]
> [ 21.163721] *** DEADLOCK ***
> [ 21.163721]
> [ 21.169623] 4 locks held by kmstest/397:
> [ 21.173532] #0: ffff8000145efc28 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_modeset_acquire_init+0x48/0x70 [drm]
> [ 21.184011] #1: ffff000805dc7888 (crtc_ww_class_mutex){+.+.}-{3:3}, at: drm_modeset_lock+0x138/0x170 [drm]
> [ 21.193795] #2: ffff8000116d53f8 (dma_fence_map){++++}-{0:0}, at: tidss_atomic_commit_tail+0x2c/0xd0 [tidss]
> [ 21.203701] #3: ffff000804fb41a8 (&mhdp->link_mutex){+.+.}-{3:3}, at: cdns_mhdp_atomic_enable+0x54/0x2c0 [cdns_mhdp8546]
> [ 21.214642]
> [ 21.214642] stack backtrace:
> [ 21.218984] CPU: 1 PID: 397 Comm: kmstest Not tainted 5.11.0-rc2-00688-g4d56a4f08391-dirty #18
> [ 21.227575] Hardware name: Texas Instruments K3 J721E SoC (DT)
> [ 21.233390] Call trace:
> [ 21.235825] dump_backtrace+0x0/0x1f0
> [ 21.239476] show_stack+0x24/0x80
> [ 21.242781] dump_stack+0xec/0x154
> [ 21.246170] print_circular_bug+0x1f8/0x200
> [ 21.250341] check_noncircular+0x164/0x180
> [ 21.254426] __lock_acquire+0x13dc/0x1be4
> [ 21.258422] lock_acquire.part.0+0x178/0x380
> [ 21.262679] lock_acquire+0x6c/0x90
> [ 21.266155] fs_reclaim_acquire+0x7c/0xa4
> [ 21.270152] kmem_cache_alloc_trace+0x7c/0x3c0
> [ 21.274582] drm_mode_duplicate+0x34/0x70 [drm]
> [ 21.279159] cdns_mhdp_atomic_enable+0x1c4/0x2c0 [cdns_mhdp8546]
> [ 21.285151] drm_atomic_bridge_chain_enable+0x60/0xd4 [drm]
> [ 21.290764] drm_atomic_helper_commit_modeset_enables+0x148/0x260 [drm_kms_helper]
> [ 21.298343] tidss_atomic_commit_tail+0x64/0xd0 [tidss]
> [ 21.303559] commit_tail+0xac/0x190 [drm_kms_helper]
> [ 21.308535] drm_atomic_helper_commit+0x194/0x3a0 [drm_kms_helper]
> [ 21.314725] drm_atomic_commit+0x58/0x70 [drm]
> [ 21.319214] drm_mode_atomic_ioctl+0x9f0/0xbbc [drm]
> [ 21.324220] drm_ioctl_kernel+0xd0/0x124 [drm]
> [ 21.328708] drm_ioctl+0x238/0x460 [drm]
> [ 21.332676] __arm64_sys_ioctl+0xb4/0x100
> [ 21.336672] el0_svc_common.constprop.0+0x80/0x1e0
> [ 21.341449] do_el0_svc+0x30/0xa0
> [ 21.344753] el0_svc+0x20/0x30
> [ 21.347795] el0_sync_handler+0x1a8/0x1b0
> [ 21.351790] el0_sync+0x174/0x180
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the dri-devel
mailing list