[systemd-devel] [PATCH] udev: warn instead of killing kmod loading

Wed Aug 27 17:07:10 PDT 2014

On Wed, Aug 27, 2014 at 4:44 PM, Greg Kroah-Hartman
<gregkh at linuxfoundation.org> wrote:
> On Wed, Aug 27, 2014 at 03:51:58PM -0700, Luis R. Rodriguez wrote:
>> On Mon, Aug 11, 2014 at 10:19 AM, Luis R. Rodriguez <mcgrof at suse.com> wrote:
>> > On Mon, Aug 11, 2014 at 12:57 PM, Lennart Poettering
>> > <lennart at poettering.net> wrote:
>> >> On Mon, 11.08.14 18:39, Luis R. Rodriguez (mcgrof at suse.com) wrote:
>> >>
>> >>> > This looks really wrong. We shouldn't permit worker processes to be
>> >>> > blocked indefinitely without any timeout applied. Designing a worker
>> >>> > process system like that is simply wrong. It's one thing to allow
>> >>> > changing the specific timeout applied, it's a very different thing to
>> >>> > allow broken drivers to completely stall the worker process logic.
>> >>>
>> >>> OK what if we enable customizations then on the timeout by the built-in
>> >>> cmd type and we use a high multiplier for now for kmod ? A multiplier
>> >>> for kmod of 10 would set the kmod timeout to 5 minutes then, as we
>> >>> sweep up and clean drivers we can reduce this over time. This would also
>> >>> enable us to keep the default timeout for the other type of workers.
>> >>
>> >> Why this complexity?
>> >>
>> >> I mean, it sounds much simpler to simply increase the default timeout a
>> >> bit, if it turns out to be too low for the current cases...
>> >
>> > True, there's two things here and one of which this v2 patch didn't address:
>> >
>> > 1) It'd be good for defaults on systemd to work on most systems based
>> > on upstream kernels today, right now folks deploying systemd would
>> > need to modify the default timeout. Are we up to bump the default up
>> > considerably? If its high, would that be unfair for classes of workers
>> > we know shouldn't take that long, or wouldn't that allow folks
>> > developing new workers to take longer?
>> >
>> > 2) We want chatty logs to allow us to keep track of drivers that need
>> > attention. Ideally right now we should strive for 30 seconds init and
>> > work on asynching most work, we want to do this in a non fatal way.
>> > Overriding the timeout won't let us to keep track of buggy drivers
>> > that need love from systemd's perspective to stay within the 30 second
>> > bound time. We can have chatty logs from the kernel but using a
>> > timeout on the driver core seems a bit overkill specially if systemd
>> > is already keeping track of driver's init time, so it'd be better if
>> > we could collect offending drivers from systemd. I could have
>> > implemented support for this in this v2 patch by simply adding another
>> > check using the default timeout.
>> >
>> > Thoughts / advice?
>>
>> Upstream wise on the Linux front we have come the the realization that
>> many drivers are not to blame given that it was not init on driver
>> paths that was taking long but instead probe. The problem is caused by
>> how the driver core currently batches together both driver init and
>> probe if a bus as autoprobe enabled and most buses do have this
>> enabled.
>>
>> I implemented a proof of concept patch that enables splitting up init
>> / probe by default always and runs probe asynchronously for all
>> drivers [0]. On my system this actually decreased boot time and I only
>> had an issue with my keyboard driver but suspect that could have been
>> that I wasn't adding drivers onto the deferred probe queue by checking
>> the probe return. I made some other changes to get this to compile but
>> those would have to go in separately and be broken down cleanly [1].
>> Based on a follow up conversation with Greg at Linuxcon he mentioned
>> Dmitry Torokhov has been wanting something similar since February when
>> Wu Zhangjin <falcon at meizu.com> had posted another asynch probe proof
>> of concept patch [2]. Greg has indicated that he'd now take this on
>> himself and work on a generic asynch probe mechanism that would enable
>> drivers to specific if they need asynch probe or not.
>
> Hey, if you have patches already, I'll be glad to look at them :)

OK well I'll spin what I have then, but I'm reviewing Wu's solution
from February as well. I take it we'd want the async_schedule()
approach rather that one based on kthread_create() right?

> And we can't do async for all drivers, we tried that 5+ years ago and
> lots of things broke, so we need to enable it on a case-by-case basis,
> unfortunately...

Odd, I only had one thing that didn't come up and it was my keyboard,
and I think I might know what the issue was. If enabled for all
drivers would it have been easy to spot issues or was it obscure
things? My system didn't blow up so I'd like to know what types of
things blew up.

  Luis