[PATCH] drivers/base: use a worker for sysfs unbind

Thu Dec 13 16:18:29 UTC 2018

On Thu, Dec 13, 2018 at 1:36 PM Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
>
> On Thu, Dec 13, 2018 at 11:23 AM Rafael J. Wysocki <rafael at kernel.org> wrote:
> >
> > On Thu, Dec 13, 2018 at 10:58 AM Daniel Vetter <daniel at ffwll.ch> wrote:
> > >
> > > On Thu, Dec 13, 2018 at 10:38:14AM +0100, Rafael J. Wysocki wrote:
> > > > On Mon, Dec 10, 2018 at 9:47 AM Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> > > > >
> > > > > Drivers might want to remove some sysfs files, which needs the same
> > > > > locks and ends up angering lockdep. Relevant snippet of the stack
> > > > > trace:
> > > > >
> > > > >   kernfs_remove_by_name_ns+0x3b/0x80
> > > > >   bus_remove_driver+0x92/0xa0
> > > > >   acpi_video_unregister+0x24/0x40
> > > > >   i915_driver_unload+0x42/0x130 [i915]
> > > > >   i915_pci_remove+0x19/0x30 [i915]
> > > > >   pci_device_remove+0x36/0xb0
> > > > >   device_release_driver_internal+0x185/0x250
> > > > >   unbind_store+0xaf/0x180
> > > > >   kernfs_fop_write+0x104/0x190
> > > >
> > > > Is the acpi_bus_unregister_driver() in acpi_video_unregister() the
> > > > source of the lockdep unhappiness?
> > >
> > > Yeah I guess I cut out too much of the lockdep splat. It complains about
> > > kernfs_fop_write and kernfs_remove_by_name_ns acquiring the same lock
> > > class. It's ofc not the same lock, so no real deadlock. Getting the
> > > device_release_driver outside of the callchain under kernfs_fop_write,
> > > which this patch does, "fixes" it. For "fixes" = shut up lockdep.
> >
> > OK, so the problem really is that the operation is started via sysfs
> > which means that this code is running under a lock already.
> >
> > Which lock does lockdep complain about, exactly?
>
> mutex_lock(&of->mutex);

OK (I thought so)

> > > Other options:
> > > - Anotate the recursion with the usual lockdep annotations. Potentially
> > >   results in lockdep not catching real deadlocks (you can still have other
> > >   loops closing the deadlock, maybe through some subsystem/bus lock).
> > >
> > > - Rewrite kernfs_fop_write to drop the lock (optionally, for callbacks
> > >   that know what they're doing), which should be fine if we refcount
> > >   everything properly (bus, driver & device).
> > >
> > > - Also note that probably the same bug exists on the bind sysfs interface,
> > >   but we don't use that, so I don't care :-)
> > >
> > > - Most of these issues are never visible in normal usage, since normally
> > >   driver bind/unbind is done from a kthread or model_load/unload, neither
> > >   of which is running in the context of that kernfs mutex kernfs_fop_write
> > >   holds. That's why I think the task work is the best solution, since it
> > >   changes the locking context of the unbind sysfs to match the locking
> > >   context of module unload and hotunplug.
> >
> > I think that using a task work here makes sense.  There is a drawback,
> > which is that the original sysfs write will not wait for the driver to
> > actually be released before returning to user space AFAICS, but that
> > probably isn't a big deal.
>
> This would happen with a normal work_struct, which runs on some other
> thread eventually. That added asynonchrouns execution uncovered lots
> of bugs in our CI (fbcon isn't solid, let's put it that way). Hence
> the task work, which will be run before the syscall returns to
> userspace, but outside of anything else. Was originally created to
> avoid locking inversion on the final fput, where the same "must
> complete before returning to userspace, but outside of any other
> locking context" issue was causing trouble.

I didn't realize that it would run completely before returning to user
space, thanks for pointing this out.

This isn't an issue then.

> > Also please note that the patch changes the code flow slightly,
> > because passing a non-NULL parent pointer to
> > device_release_driver_internal() potentially has side effects, but
> > that should not be a big deal either.
>
> I can do the old code exactly, but afaict the non-NULL parent just
> takes care of the parent bus locking for us, instead of hand-rolling
> it in the caller. But if I missed something, I can easily undo that
> part.

It is different if device links are present, but I'm not worried about
that case honestly. :-)

> > > Unfortunately that trick doesn't work for the bind sysfs file, since that way we can't thread the errno value back to userspace.
> >
> > Right.  That is unless we wait for the operation to complete and check
> > the error left behind by it.  That should be doable, but somewhat
> > complicated.
>
> For real deadlocks this doesn't fix anything, it just hides it from
> lockdep. cross-release lockdep would still complain. If we want to fix
> the bind side _and_ keep reporting the errno from the driver's bind
> function, then we need to rework kernfs to and add a callback which
> doesn't hold the mutex. Should be doable, just a pile more work.

It should be possible to store the error in a variable and export that
via a separate attribute for user space to inspect.  That would be a
significant I/F change, however.

Cheers,
Rafael