[Intel-gfx] [PATCH] input/psmouse: Don't hold the mutex while calling ->disconnect

Daniel Vetter daniel.vetter at ffwll.ch
Mon Oct 1 09:23:46 UTC 2018


On Wed, May 2, 2018 at 11:06 AM Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
>
> On Mon, Apr 30, 2018 at 11:17 PM, Dmitry Torokhov
> <dmitry.torokhov at gmail.com> wrote:
> > Hi Daniel,
> >
> > On Mon, Apr 30, 2018 at 09:56:49PM +0200, Daniel Vetter wrote:
> >> At least trackpoint_disconnect wants to remove some sysfs files, and
> >> we can't remove sysfs files while holding psmouse_mutex:
> >>
> >> ======================================================
> >> WARNING: possible circular locking dependency detected
> >> 4.16.0-rc5-g613eb885b69e-drmtip_1+ #1 Tainted: G     U
> >> ------------------------------------------------------
> >> kworker/0:3/102 is trying to acquire lock:
> >>  (kn->count#130){++++}, at: [<000000009679748b>] kernfs_remove_by_name_ns+0x3b/0x80
> >>
> >> but task is already holding lock:
> >>  (psmouse_mutex){+.+.}, at: [<0000000014f44bcc>] psmouse_disconnect+0x62/0x160
> >>
> >> which lock already depends on the new lock.
> >>
> >> the existing dependency chain (in reverse order) is:
> >>
> >> -> #1 (psmouse_mutex){+.+.}:
> >>        psmouse_attr_set_helper+0x28/0x140
> >>        kernfs_fop_write+0xfe/0x180
> >>        __vfs_write+0x1e/0x130
> >>        vfs_write+0xbd/0x1b0
> >>        SyS_write+0x40/0xa0
> >>        do_syscall_64+0x65/0x1a0
> >>        entry_SYSCALL_64_after_hwframe+0x42/0xb7
> >>
> >> -> #0 (kn->count#130){++++}:
> >>        __kernfs_remove+0x243/0x2b0
> >>        kernfs_remove_by_name_ns+0x3b/0x80
> >>        remove_files.isra.0+0x2b/0x60
> >>        sysfs_remove_group+0x38/0x80
> >>        sysfs_remove_groups+0x24/0x40
> >>        trackpoint_disconnect+0x2c/0x50
> >>        psmouse_disconnect+0x8f/0x160
> >>        serio_disconnect_driver+0x28/0x40
> >>        serio_driver_remove+0xc/0x10
> >>        device_release_driver_internal+0x15b/0x230
> >>        serio_handle_event+0x1c8/0x260
> >>        process_one_work+0x215/0x620
> >>        worker_thread+0x48/0x3a0
> >>        kthread+0xfb/0x130
> >>        ret_from_fork+0x3a/0x50
> >>
> >> other info that might help us debug this:
> >>
> >>  Possible unsafe locking scenario:
> >>
> >>        CPU0                    CPU1
> >>        ----                    ----
> >>   lock(psmouse_mutex);
> >>                                lock(kn->count#130);
> >>                                lock(psmouse_mutex);
> >>   lock(kn->count#130);
> >>
> >>  *** DEADLOCK ***
> >>
> >> 6 locks held by kworker/0:3/102:
> >>  #0:  ((wq_completion)"events_long"){+.+.}, at: [<000000002e408bfa>] process_one_work+0x191/0x620
> >>  #1:  (serio_event_work){+.+.}, at: [<000000002e408bfa>] process_one_work+0x191/0x620
> >>  #2:  (serio_mutex){+.+.}, at: [<00000000c8a49847>] serio_handle_event+0x23/0x260
> >>  #3:  (&dev->mutex){....}, at: [<00000000b55eee75>] device_release_driver_internal+0x2f/0x230
> >>  #4:  (&serio->drv_mutex){+.+.}, at: [<000000009719f997>] serio_disconnect_driver+0x16/0x40
> >>  #5:  (psmouse_mutex){+.+.}, at: [<0000000014f44bcc>] psmouse_disconnect+0x62/0x160
> >>
> >> stack backtrace:
> >> CPU: 0 PID: 102 Comm: kworker/0:3 Tainted: G     U           4.16.0-rc5-g613eb885b69e-drmtip_1+ #1
> >> Hardware name: LENOVO 74591P0/74591P0, BIOS 6DET28WW (1.05 ) 07/30/2008
> >> Workqueue: events_long serio_handle_event
> >> Call Trace:
> >>  dump_stack+0x5f/0x86
> >>  print_circular_bug.isra.18+0x1d0/0x2c0
> >>  __lock_acquire+0x14ae/0x1b60
> >>  ? kernfs_remove_by_name_ns+0x20/0x80
> >>  ? lock_acquire+0xaf/0x200
> >>  lock_acquire+0xaf/0x200
> >>  ? kernfs_remove_by_name_ns+0x3b/0x80
> >>  __kernfs_remove+0x243/0x2b0
> >>  ? kernfs_remove_by_name_ns+0x3b/0x80
> >>  ? kernfs_name_hash+0xd/0x70
> >>  ? kernfs_find_ns+0x7e/0x100
> >>  kernfs_remove_by_name_ns+0x3b/0x80
> >>  remove_files.isra.0+0x2b/0x60
> >>  sysfs_remove_group+0x38/0x80
> >>  sysfs_remove_groups+0x24/0x40
> >>  trackpoint_disconnect+0x2c/0x50
> >>  psmouse_disconnect+0x8f/0x160
> >>  serio_disconnect_driver+0x28/0x40
> >>  serio_driver_remove+0xc/0x10
> >>  device_release_driver_internal+0x15b/0x230
> >>  serio_handle_event+0x1c8/0x260
> >>  process_one_work+0x215/0x620
> >>  worker_thread+0x48/0x3a0
> >>  ? _raw_spin_unlock_irqrestore+0x4c/0x60
> >>  kthread+0xfb/0x130
> >>  ? process_one_work+0x620/0x620
> >>  ? _kthread_create_on_node+0x30/0x30
> >>  ret_from_fork+0x3a/0x50
> >>
> >> Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
> >> Cc: Dmitry Torokhov <dmitry.torokhov at gmail.com>
> >> Cc: Benjamin Tissoires <benjamin.tissoires at redhat.com>
> >> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> >> Cc: Arvind Yadav <arvind.yadav.cs at gmail.com>
> >> Cc: Stephen Lyons <slysven at virginmedia.com>
> >> Cc: linux-input at vger.kernel.org
> >> ---
> >>  drivers/input/mouse/psmouse-base.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/drivers/input/mouse/psmouse-base.c b/drivers/input/mouse/psmouse-base.c
> >> index 8900c3166ebf..06ccd8e22f3c 100644
> >> --- a/drivers/input/mouse/psmouse-base.c
> >> +++ b/drivers/input/mouse/psmouse-base.c
> >> @@ -1484,8 +1484,10 @@ static void psmouse_disconnect(struct serio *serio)
> >>               psmouse_deactivate(parent);
> >>       }
> >>
> >> +     mutex_unlock(&psmouse_mutex);
> >>       if (psmouse->disconnect)
> >>               psmouse->disconnect(psmouse);
> >> +     mutex_lock(&psmouse_mutex);
> >
> > Why do you think it is proper to drop this mutex? It is introduced for a
> > reason.
> >
> > I think the trace you are seeing is due to:
> >
> > commit 988cd7afb3f37598891ca70b4c6eb914c338c46a
> > Author: Tejun Heo <tj at kernel.org>
> > Date:   Mon Feb 3 14:02:58 2014 -0500
> >
> >     kernfs: remove kernfs_addrm_cxt
> >
> > where we started taking kernfs_mutex when adding/removing sysfs files.
> > Ultimately I think we may have to change switching protocol to a work
> > item to be done asynchronously from writing to sysfs attribute.
>
> Well "holding a lock while adding/removing sysfs files and holding the
> same lock from sysfs read/write callbacks" is a classic deadlock. I've
> made lockdep shut up about it, I have no idea how to fix it properly.
> kernfs switching it's locking doesn't change anything here.
>
> Generally the fix is to untangle the locking: You need 1 set of locks
> to protect the datastructures exposed through sysfs, and another thing
> (maybe that's provided by serio already, I have no idea) to make sure
> you're ordering connect/disconnect handling correct and there's not
> concurrent calls to this hooks. Assuming serio does give that
> guarantee then there's no need to hold the lock while unregistering
> the sysfs files (we could perhaps push the lock dropping down into
> trackpoint_disconnect, but that's less maintainable imo since it's not
> obvious in psmouse_disconnect what's going on), and my patch is
> correct.
>
> But I didn't do a full locking audit of what exactly serio guarantees,
> and what exactly psmouse_mutex needs to protect (and where
> psmouse_mutex is only held because it's convenient).

Ping? Just noticed that I still have this bugfix hanging around.
bugfix as in "make lockdep more useful", not necessarily fixing the
locking here properly. I guess I could respin to add a FIXME comment,
but not sure how useful that would be.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the Intel-gfx mailing list