[systemd-devel] Error handling problems with systemd-machined

Daniel P. Berrange berrange at redhat.com
Wed Jul 24 09:10:59 PDT 2013


On Wed, Jul 24, 2013 at 05:59:48PM +0200, Lennart Poettering wrote:
> > When something goes wrong with the CreateMachine DBus call though all I
> > ever seem to get back is  "Input/output error".
> > 
> > After strace'ing systemd-machined I find the real error
> > 
> > recvmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{"l\1\0\1\334\0\0\0\2\0\0\0\277\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/machine1\0\0\0\0\0\0\0\6\1s\0\30\0\0\0org.freedesktop.machine1\0\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.machine1.Manager\0\0\0\0\0\0\0\0\3\1s\0\r\0\0\0CreateMachine\0\0\0\10\1g\0\fsayssusa(sv)\0\0\0\0\0\0\0\7\1s\0\6\0\0\0:1.130\0\0\t\0\0\0lxc-busy2\0\0\0\20\0\0\0\335\247\271G\10F\27Y(s\0177]\367\327\353\v\0\0\0libvirt-lxc\0\t\0\0\0container\0\0\0\210:\0\0\0\0\0\0\0\0\0\0\204\0\0\0\0\0\0\0\5\0\0\0Slice\0\1s\0\0\0\0\16\0\0\0/machine.slice\0\0\0\0\0\0\r\0\0\0CPUAccounting\0\1b\0\0\0\0\1\0\0\0\0\0\0\0\21\0\0\0BlockIOAccounting\0\1b\0\0\0\0\1\0\0\0\20\0\0\0MemoryAccounting\0\1b\0\1\0\0\0", 2048}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 428
> > sendmsg(5, {msg_name(0)=NULL, msg_iov(2)=[{"l\1\0\1D\1\0\0\n\0\0\0\255\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\6\1s\0\30\0\0\0org.freedesktop.systemd1\0\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\22\0\0\0StartTransientUnit\0\0\0\0\0\0\10\1g\0\7ssa(sv)\0\0\0\0", 192}, {"\32\0\0\0machine-lxc\\x2dbusy2.scope\0\0\4\0\0\0fail\0\0\0\0\24\1\0\0\5\0\0\0Slice\0\1s\0\0\0\0\r\0\0\0machine.slice\0\0\0\0\0\0\0\v\0\0\0Description\0\1s\0\0\23\0\0\0Container lxc-busy2\0\0\0\0\0\17\0\0\0TimeoutStopUSec\0\1t\0\0 \241\7\0\0\0\0\0\4\0\0\0PIDs\0\2au\0\0\0\0\4\0\0\0\210:\0\0\5\0\0\0Slice\0\1s\0\0\0\0\16\0\0\0/machine.slice\0\0\0\0\0\0\r\0\0\0CPUAccounting\0\1b\0\0\0\0\1\0\0\0\0\0\0\0\21\0\0\0BlockIOAccounting\0\1b\0\0\0\0\1\0\0\0\20\0\0\0MemoryAccounting\0\1b\0\1\0\0\0", 324}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 516
> > recvmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{"l\3\1\1+\0\0\0\265\1\0\0]\0\0\0\6\1s\0\6\0\0\0:1.126\0\0\4\1s\0$\0\0\0org.freedesktop.systemd1.InvalidName\0\0\0\0\5\1u\0\n\0\0\0\10\1g\0\1s\0\0\7\1s\0\4\0\0\0:1.1\0\0\0\0&\0\0\0Unit name /machine.slice is not valid.\0", 2048}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 155
> > sendmsg(3, {msg_name(0)=NULL, msg_iov(4)=[{"PRIORITY=3\nSYSLOG_FACILITY=4\nCODE_FILE=src/machine/machine.c\nCODE_LINE=246\nCODE_FUNCTION=machine_start_scope\nSYSLOG_IDENTIFIER=systemd-machined\n", 144}, {"MESSAGE=", 8}, {"Failed to start machine scope: Unit name /machine.slice is not valid.", 69}, {"\n", 1}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 222
> > sendmsg(5, {msg_name(0)=NULL, msg_iov(2)=[{"l\3\1\1\27\0\0\0\v\0\0\0O\0\0\0\6\1s\0\6\0\0\0:1.130\0\0\4\1s\0\"\0\0\0org.freedesktop.DBus.Error.IOError\0\0\0\0\0\0\5\1u\0\2\0\0\0\10\1g\0\1s\0\0", 96}, {"\22\0\0\0Input/output error\0", 23}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 119
> > 
> > 
> > So machined is getting a useful error back from systemd
> > 
> >   "Unit name /machine.slice is not valid."
> > 
> > and syslog'ing that error, and then sending back the dbus client a useless
> > "Input/output error" message :-(
> 
> Yeah, we really suck at handing out good errors. But usually should
> should have gotten an (equally useless) EINVAL in most cases.
> 
> > Once I fixed the unit name to removing the leading '/', I hit a second
> > error
> > 
> > recvmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{"l\3\1\0014\0\0\0\301\1\0\0]\0\0\0\6\1s\0\6\0\0\0:1.126\0\0\4\1s\0#\0\0\0org.freedesktop.systemd1.UnitExists\0\0\0\0\0\5\1u\0\f\0\0\0\10\1g\0\1s\0\0\7\1s\0\4\0\0\0:1.1\0\0\0\0/\0\0\0Unit machine-lxc\\x2dbusy2.scope already exists.\0", 2048}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 164
> > 
> >   "Unit machine-lxc\\x2dbusy2.scope already exists"
> > 
> > But neither "machinectl list" or "systemctl --full" show any such machine
> > or unit existing. It seems like when it reported the bogus slice name,
> > it did not fully clean up the transient scope unit it created. This is
> > then blocking further attempts to create the same transient scope.
> 
> Hmm, that's interesting. What does "systemctl status" say for the unit
> in question when this happens? Could you paste?

# systemctl status 'machine-lxc\x2dbusy4.scope'
machine-lxc\x2dbusy4.scope
   Loaded: stub (/run/systemd/system/machine-lxc\x2dbusy4.scope; static)
   Active: inactive (dead)

> Kay had some issues where the kernel's release_agent wouldn't be called
> on recent kernels, but I never had issues with that...

If systemd is complaining about the bogus slice name "/machine.slice"
is it possible that it has returned this error, before it ever placed
the "init" PID into the cgroup? The kernel release_agent would never
trigger if there was no process any cgroup to exit, and thus the
slice may not get cleaned up.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|


More information about the systemd-devel mailing list