[systemd-devel] systemd and criu (checkpoint / restart)

Tue Jul 2 17:42:00 PDT 2013

Hello,

TL;DR criu works if you disable the journal and stop the .socket
before restore, criu appears to be incompatible with systemd-nspawn.

I've been "having fun" with systemd, -nspawn, and the latest criu
tools. These are just my research notes. I wanted to share progress,
would love any feedback or pointers on places this does not work.

Before we begin:

localhost criu2 # systemd --version
systemd 204
+PAM -LIBWRAP -AUDIT -SELINUX +IMA -SYSVINIT -LIBCRYPTSETUP -GCRYPT -ACL -XZ
localhost criu2 # criu -V
Version: 0.6
localhost criu2 # uname -a
Linux localhost 3.10.0+ #4 SMP Mon Jul 1 13:36:25 PDT 2013 x86_64
Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz GenuineIntel GNU/Linux

In the following gist:
1) I setup a socket activated go http server (just plain, no nspawn)
2) start the process via socket activation
3) criu dump it
4) shut down the .socket
5) criu restore, works, yay

  https://gist.github.com/polvi/310fad0a2a3b0859cfb1

What works:
-  Application state is dumped and restored successfully.

What doesn't work:
- The system .socket has to be disabled to because the restore will
open the socket. Not sure if there is a work around for this, with the
exception of not using socket activation.
- The status of the .service is now in a killed state. This is because
the dump kills the process when it is done.
- Once the process is restored (with the same pid) systemd is confused
and is no longer monitoring it. Maybe there is a way to get systemd to
realize that the process is running again?
- I had to set StandardError=null and StanardOut=null to keep the
journal from opening a socket to the service. With-out this the
container will not checkpoint, because the criu tools do not allow one
end of a socket to be checkpointed.

For -nspawn, I'm pretty sure it is incompatible with criu. This took a
bunch of fighting, and in the end, triggered a bug in criu. I was able
to dump, but not restore the container.

In the following gist you'll see where I hit bugs and the fix...

1) I start a busybox while loop in a container using systemd-nspawn
2) nsenter the container and umount /proc/sys/kernel/random/boot_id
and /proc/kmsg because they are in a "(deleted)" state, and criu does
not really care for that. I guess -nspawn is setting these up.
3) Updated my copy of iproute2, because criu requires the "ip addr
save" functionality
4) Now it gets really bad... criu restores mount namespaces using
pivot_root. systemd uses MS_ENTER.  Initially I was hitting a bug
where pivot_root was not working because my container was on a
different filesystem. After bind mounting the container filesystem to
my running root, the restore triggers a bug.
5) I give up.

  https://gist.github.com/polvi/d883043343e4db8e16cb

What works:
- It dumped the state of something!

What does not work:
- Restoring
- Using the containers mount namespace (because of criu and pivot_root)

In summary, to make this actually work, I think we'd need to implement
checkpoint/restart into systemd itself. With this, we could get around
all the journal issue, and maybe even make socket activation work.
Containers seem to be their own beast.

-Alex