[systemd-devel] Failure to umount /var at shutdown

Thu Oct 23 02:39:38 PDT 2014

On Thu, 23.10.14 11:27, Daniele Nicolodi (daniele at grinta.net) wrote:

> Hello,
> 
> I have a Debian sid system where there is a problem with the unmonting
> of the /var filesystem that causes a delay in the shutdown process:
> 
> > ott 21 10:08:46 nautilus virtualbox[28559]: Stopping VirtualBox kernel modules.
> > ott 21 10:08:46 nautilus systemd[2086]: Received SIGRTMIN+24 from PID 28503 (kill).
> > ott 21 10:08:46 nautilus systemd[1186]: Received SIGRTMIN+24 from PID 28500 (kill).
> > ott 21 10:08:46 nautilus systemd[2087]: pam_unix(systemd-user:session): session closed for user lele
> > ott 21 10:08:46 nautilus systemd[1192]: pam_unix(systemd-user:session): session closed for user lightdm
> > ott 21 10:10:16 nautilus systemd[1]: user at 117.service stop-sigterm timed out. Killing.
> > ott 21 10:10:16 nautilus systemd[1]: Unit user at 117.service entered failed state.
> > ott 21 10:10:16 nautilus systemd-udevd[294]: Network interface NamePolicy= disabled on kernel commandline, ignoring.
> > ott 21 10:10:16 nautilus networking[28622]: Deconfiguring network interfaces...done.
> > ott 21 10:10:16 nautilus lvm[28706]: 5 logical volume(s) in volume group "system" unmonitored
> > ott 21 10:10:16 nautilus umount[28721]: umount: /var: target is busy
> > ott 21 10:10:16 nautilus umount[28721]: (In some cases useful info about processes that
> > ott 21 10:10:16 nautilus umount[28721]: use the device is found by lsof(8) or fuser(1).)
> > ott 21 10:10:16 nautilus systemd[1]: var.mount mount process exited, code=exited status=32
> > ott 21 10:10:16 nautilus systemd[1]: Failed unmounting /var.
> > ott 21 10:10:17 nautilus systemd[1]: Shutting down.
> > ott 21 10:10:17 nautilus systemd-journal[267]: Journal stopped
> 
> As you can see, the umount for /var fails because the filesystem is in
> use and this apparently makes systemd to wait for what seems to be a 90
> seconds timeout before proceeding with the shutdown.

This is journald's fault, it keeps the log files open and runs until
the very end. It's a know issue. We should fix this by synchronously
moving logging back to /run right before we want to unmount
/var. While this will make this error go away, the logs from that
point on will effectively be lost as /run is of course flushed on
reboot.

The current behaviour is mostly a cosmetic problem though, as in the
final killing spree journald will be killed after all, and we will do
another unmounting round which gets rid of /var, too. Hence data loss
will not occur.

> First, how can I debug what is going on, namely how can I see which
> process is keeping /var busy?  Second, where does the 90 seconds timeout
> come from?  Does it make sense to wait for a timeout if the un-mounting
> of a partition fails a shutdown?

The timeout is unrelated, it's probably an indication of lost cgroup
events. We shifted around a few things about that a while back, please
make sure to check the current git version before reporting back on
this one (release is going to be soon, hopefully).

Lennart

-- 
Lennart Poettering, Red Hat