[systemd-devel] My experience with MySQL and systemctl

Tue Apr 11 13:08:35 UTC 2017

On Tue, 11.04.17 13:41, Samuel Williams (space.ship.traveller at gmail.com) wrote:

> I had an accident last night. I tried to delete a lot of rows from a
> production database in one transaction. I killed the transaction, and
> I didn't realise it was still rolling back an hour later when I tried
> to reboot the system for updates.
> 
> I might be wrong about exactly who is doing what, but I'll do my best
> to explain what happened, and then give a few notes about how I think
> the process could be improved. I don't specifically know if it's
> systemctl or the service file, or mysqld causing the issues.
> 
> systemctl tried to shutdown mysqld, but due to the rollback, it
> wouldn't exit with anything less than kill -9. So, systemctl entered
> the failed state, however, in addition to that, it appears it closed
> the log output of the mysqld process so I couldn't see any of the
> output from the daemon.

IIRC mysql actually uses its own log-file, and neither logs to stderr
nor to syslog?

Note that systemd don't actually close stdout/stderr or syslog at all, this
is always done by the logging client, not the recipient (journald).

> Eventually, after checking all our backups, I decided to issue kill -9
> to mysqld. I then decided to try restarting the daemon using
> systemctl. It did start up, the log output showed the crash recovery
> procedure, but because it entered into the rollback recovery,
> systemctl never considered that the process had finished starting up,
> and then tried to kill it again, which failed (only kill -9 would work
> in this case). Again, log output was closed.

Again, we really don't do that. Logging is actually pretty independent
from service management when it comes to keeping the connections open:
all systemd does is set up the connection, fork off the process and
that's it. The forked processes can keep the connections open as long
as they wish, or even pass them on to other processes if they like...

That said, there's one long-standing race: if a process logs through
syslog(), and very quickly after that exits we under some
circumstances have trouble matching up the log output with the
originating service. That's because the SCM_CREDENTIAL data we get for
the logged messages might reference a PID which already disappeared
from /proc/$PID which we hence cannot read the cgroup membership info
from.

Hence: maybe the logs are there, but the filtering didn't work the way
you expect?

> I issued kill -9 again, and started up the process manually `sudo -u
> mysql mysqld` and let it go through the entire recovery process, and
> then rebooted the system. Normalcy restored.
> 
> I think that the following things could be investigated/improved:
> 
> - Log output shouldn't be closed if the daemon can't be killed. I'm
> not sure who is doing this - perhaps it's mysqld, or perhaps it's the
> service file, or perhaps its systemctl?

I doubt that it's systemd.

> - If a daemon fails to start up, trying to kill it.. may not be the
> best option. It's probably a matter of the systemctl service file
> detecting that a rollback is in progress and accepting that as a valid
> startup state, but I'm not really sure. In any case, I ended up having
> to do this process manually.

The timeouts for killing services that don't start up correctly are
configurable per service, and by setting them to "infinity" you may
even turn them off entirely. This is configurable precisely so that
services that can take ages to start-up in real-life can increase or
turn off the timeout the way they need. Or in other words: please make
sure that your mysql.service carries the right
StartTimeoutSec=/StopTimeoutSec= settings, and please contact your
unit file vendor to fix this.

Lennart

-- 
Lennart Poettering, Red Hat