[systemd-devel] [PATCH] readahead: read /usr files last for rotational media, skip /var

Kay Sievers kay.sievers at vrfy.org
Fri Sep 30 06:08:55 PDT 2011


On Fri, Sep 30, 2011 at 14:31, Paolo Bonzini <bonzini at gnu.org> wrote:
> On 09/30/2011 01:32 PM, Kay Sievers wrote:
>>>
>>> So much that I've
>>> been thinking about adding "virtual" mount units that become active as
>>> soon
>>> as any directory above it is mounted.  This way, units that require /usr
>>> could be made to depend on usr.mount.
>>
>> No, this will all not work for any non-trivial (like a web server or
>> something very simple) setup. The tools from /usr are needed to boot
>> up for any modern system.
>
> Sure they are needed to complete default.target, but that doesn't mean that
> they are required by e.g. sysinit.target or even remote-fs.target.

They are. Most prominent is users of udev rules which reference /usr.
All device setup happens during that step. Details are in the pages
behind the links in the earlier mail.

> No tools
> from /usr are needed to bring up remote file systems,

That's old UNIX thinking, and makes not much sense today on Linux.
It's broken from many more complicated setups with dependencies.

We also have the initramfs to bring a box up with complicated storage
setups. There is no need for (rather randomly selected) tools to live
in a split-off rootfs.

> except perhaps
> NetworkManager which is optional.

Everything using udev, dbus, whatever ...

> Anyway, I don't believe this is the right time and venue to argue about this
> since it has already been discussed apparently.

Yeah, I hope it is.

>>> In fact, I think it is very wrong to make binfmt load from
>>> /usr/lib/binfmt.d.  Personally, I would have made it
>>> /lib/systemd/binfmt.d
>>> (likewise for tmpfiles).
>>
>> There should be no early boot tools that need binfmt.
>
> Fair enough.
>
> Actually I see a contradiction: if /lib is going to become /usr/lib, there's
> no reason to hard code /usr paths in systemd.  Just use /lib until the day
> comes.  But it's irrelevant.

Nah, it's not needed in the rootfs, hence we already put it where the
rest will move to. And sure, we will change all paths in systemd to be
prefixed with /usr and not to rely on the existence of the compat
symlinks.

>>> If you really want to use /usr, there should be two instances of
>>> binfmt/tmpfiles/etc. one that is activated very early (loading from /etc
>>> and
>>> /lib) and one that is activated after remote-fs.target (in the lack of
>>> usr.mount---yes, remote!) that loads from /usr/lib and /usr/local/lib.
>>
>> It's not needed, the stuff in the rootfs will go away over time and
>> the top-level dirs there will be replaced with compat symlinks.
>
> Out of curiosity, why not the other way round?  I.e. move everything to
> rootfs and "ln -sf /usr /"?

You did not read any of the pages behind the links in the earlier
mail, right? :)

Because the rootfs is the exception, not /usr. We would need to
introduce a bunch of additional top-level directories like /share,
/include which really makes no sense, and would be the opposite of
where we want to be. We need a single directory for the entire
installed system, to be able to safely atomically snapshot it, to be
able to read-only mount it, and to be able to share it without jumping
through hoops.

>>>> Also, I'm not sure if I understand your suggestion that /var should be
>>>> ignored. In particular I think /var/tmp would be useful to readahead
>>>> (albeit probably as one of the last things to do).
>>>
>>> You could add that as a third group, after / and /usr.  The patch makes
>>> that
>>> kind of extensibility very easy.
>>
>> Rules which files to prioritize *might* make sense, sorting by
>> top-level dir doesn't really.
>
> Rules about files to prioritize cannot really be implemented.  You cannot
> statically determine which files will be loaded, because many of them are
> plugins.  You could implement some kind of ordering such as "prioritize
> files used by udev and its children" (fanotify events have a pid field), but
> I don't believe this makes much sense since you have a conflict between
> systemd's decisions and readahead-collect's.  Not to mention that readahead
> can influence the order in which units complete.
>
> So, you need hard barriers at major serialization points, where you flush
> the readahead and accept the penalty of seeking back to the beginning of the
> disk (in the interest of completing the serializing target as soon as
> possible).  One such barrier could be after udev.service becomes active, for
> example, another after local-fs.target finishes, another network.target
> finishes.
>
> You can communicate this with a systemd unit that just sends a signal to
> systemd-readahead-collect, or by letting it subscribe to systemd DBus
> notifications.  But, you also need to preserve barriers when
> systemd-readahead-replay is reading data (when s-r-r reads data after the
> first barrier, s-r-c must account it after the first barrier; I'm not even
> sure you can do that without merging the two processes or at least letting
> s-r-c know the pid of s-r-r).  Certainly not a half-hour hack.
>
> You can see my patch as a first step, with the hard barrier being a toplevel
> directory instead of being an external notification such as a signal.  If it
> really does not make sense fine, I'll just enjoy my 25% faster boot and keep
> the patch locally.  It's just a pity that I spent so much time writing the
> commit message.

Nah, I meant something different. Your hardcoded /usr is a rule too.
It's not about specific file names to prioritizr but you could have a
set of rules that can be provided that could say: "prioritize
*/lib/modules/* over the rest", and things like that.

/usr will not be a good rule, and hard coding such things in the code,
we should should probably avoid in general.

Kay


More information about the systemd-devel mailing list