[systemd-devel] [PATCH] udev: warn instead of killing kmod loading

Luis R. Rodriguez mcgrof at suse.com
Sat Aug 9 07:22:01 PDT 2014


On Sat, Aug 9, 2014 at 10:06 AM, Greg Kroah-Hartman
<gregkh at linuxfoundation.org> wrote:
> On Sat, Aug 09, 2014 at 10:33:47AM +0200, Luis R. Rodriguez wrote:
>> On Sat, Aug 09, 2014 at 09:42:36AM +0200, Kay Sievers wrote:
>> > On Sat, Aug 9, 2014 at 4:16 AM, Luis R. Rodriguez
>> > <mcgrof at do-not-panic.com> wrote:
>> > > The purpose of commit e64fae55 (January 2012) on systemd was
>> > > to introduce a timeout send to hell drivers that are not using
>> > > asynch firmware loading. That commit actually would not have
>> > > triggered in full effect on udev's usage of kmod for module
>> > > loading until commit 786235ee was merged on Linux (Nov 2013).
>> > >
>> > > As it is today [ systemd e64fae55 + kernel e64fae55 ] will trigger
>> > > a SIGKILL to udev's usage of kmod for module loading after a 30
>> > > second timeout. Hannes modified systemd through commit 9719859c
>> > > to enable a custom timeout. A different timeout value can only
>> > > prevent a kill after a maximum amount of time is known to be
>> > > required for a system.
>> > >
>> > > Penalizing a device driver for not using asynch firmware loading
>> > > by killing it and preventing it from loading *might* have originally
>> > > been reasonable but its not the only reason why some drivers might
>> > > take more than 30 seconds to load. Some drivers might actually
>> > > require take over 30 seconds on just writing the firmware to the
>> > > hardware. The worst case scenario however would be to run into
>> > > storage drivers which might go over the timeout value in which
>> > > case currently the system would simply be unbootable. Fixing
>> > > drivers should be our *top priority* but the current state of
>> > > affairs has proven to make it very difficult to debug why a
>> > > driver is failing to load.
>> > >
>> > > Instead of always forcing a kill lets only warn for workers
>> > > handling kmod. This should enable easier methods for determining
>> > > which drivers need fixing and the logic would only be used on
>> > > workers dealing with kmod module loading.
>> >
>> > Nobody wanted to send anything to hell, penalize or force anything
>> > anywhere. This kind of language is absolutely not welcome here.
>> >
>> > Every operation in systemd, unless specified otherwise, has and needs
>> > to have a timeout. The 30 seconds were arbitrarily chosen just to be
>> > smaller than the kernel's own 60 second timeout for the userspace
>> > firmware loader. Now that userspace firmware loading is gone, this
>> > does not apply anymore.
>> >
>> > Like everywhere else, we should keep the timeout handling by default.
>> > If 60 seconds are too short, we might want to set it to something
>> > else.
>>
>> Putting emphasis only on firmware loading is exactly what took us to where we
>> are today with the current timeout. As we have seen though firmware loading
>> though is not what actually takes a lot of time, at times actually writing the
>> firmware to hardware can take more time. There are other scenarios which have
>> creeped up as well such as delays on other areas of network drivers and storage
>> drivers.  We're all in agreement all this needs to be fixed on drivers, however
>> in light of these other circumstances and given that it will take time to fix
>> these drivers, and given that its hard to debug the cause to current driver
>> failures on the timeout a warning for kmod loading would do much more to help
>> use fix drivers than a kill.
>
> "time to fix these drivers"?  I posted a 10 line patch to do so for any
> driver that has a problem, and another core kernel developer agreed with
> it and said it should be made even more general, and easier to use for
> all drivers (resulting in only a 1-2 line change per driver affected.)
>
> What happened to that work, has it been dropped for some reason?  Was it
> tested and found to not work properly?  Was it rejected by a subsystem
> maintainer that I didn't see?

No that's next on my plate. Working on that now!

The reason for this specific systemd patch from a kernel perspective
is that debugging and finding out what drivers are actually going over
the set timeout is currently not an easy task. The kernel patches to
make a general work around for the issue on systemd would only be
possible once a driver is found to have an issue. This would simplify
the search for the offending drivers without forcing drivers to also
fail.

  Luis


More information about the systemd-devel mailing list