[systemd-devel] race conditions after SIGTERM
Reindl Harald
h.reindl at thelounge.net
Fri Jul 25 03:11:35 PDT 2014
and it happended again
how can it be that if i have running mysqld-instances as
Type=simple directly starting mysqld that it is not safe
after "systemctl stop" returns to rsync the datadirs
i had over a long time in any mysqld-unit
ExecStopPost=/usr/bin/sleep 1
pretty sure because that happened in the past
but that is a really dirty solution
is 1 second enough, if not 2 or better 5
i have now a "sleep 6" in the backup scripts - not beautiful
after "systemctl stop" returns i expect that the MAINPID is
*really* done, has finished it's cleanups and written data
to disk and that systemd especially in case of a non-forking
daemon knows trustable when it is finished
Am 24.07.2014 13:24, schrieb Reindl Harald:
> maybe that's a systemd thing
>
> i know Fedora 19 has not a recent systemd but the question
> remains if systemctl in case of "Type=simple" may act the same
> way while stop a service as for starting - send the SIGTERM
> and immediately return while the binary still writes data
>
> that could explain race conditions like below
>
> * stop mysqld instance 1
> * stop mysqld instance 2
> * the services still flush data but "systemctl" already returned
> * rsync both datadir
> * corruption
>
> -------- Original-Nachricht --------
> Betreff: race conditions after SIGTERM
> Datum: Thu, 24 Jul 2014 12:42:51 +0200
> Von: Reindl Harald <h.reindl at thelounge.net>
> An: Mailing-List mariadb <"maria-discuss"@lists.launchpad.net>
>
> how can that script lead to a race condition where files
> are not fully written to disk? that never happens if the
> systemd-unit for the replication instance has
> "ExecStopPost=/usr/bin/sleep 1" and waits a while
>
> my only explaination is that mysqld returns after the SIGTERM
> from systemd before all data are completly written in some racy
> situations and so rsynced incompletly to the datadir of the
> other instance
> _________________________________________________________
>
> #!/bin/bash
> systemctl stop replication.service
> systemctl stop mysqld.service
> rsync $RSYNC_PARAMS /mysql_replication/ /mysql_data/
> systemctl start replication.service
> systemctl start mysqld.service
> _________________________________________________________
>
> [Unit]
> Description=MariaDB Replication
>
> [Service]
> Type=simple
> PIDFile=/run/mysqld/mysqld_replication.pid
> ExecStart=/usr/libexec/mysqld --defaults-file=/etc/my-replication.cnf --pid-file=/run/mysqld/mysqld_replication.pid
> --socket=/var/lib/mysql/mysql_replication.sock --open-files-limit=30000 --basedir=/usr --user=mysql
> Environment="LANG=en_GB.UTF-8"
> Restart=always
> RestartSec=1
> _________________________________________________________
>
> 140724 12:22:59 [Note] /usr/libexec/mysqld: Shutdown complete
> 140724 12:23:01 [Note] Plugin 'InnoDB' is disabled.
> Cannot find checkpoint record at LSN (1,0x35767)
> 140724 12:23:01 [ERROR] mysqld: Aria recovery failed. Please run aria_chk -r on all Aria tables and delete all
> aria_log.######## files
> 140724 12:23:01 [ERROR] Plugin 'Aria' init function returned error.
> 140724 12:23:01 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.
> 140724 12:23:01 [Note] Plugin 'FEDERATED' is disabled.
> 140724 12:23:01 [Note] Plugin 'FEEDBACK' is disabled.
> 140724 12:23:01 [ERROR] Aria engine is not enabled or did not start. The Aria engine must be enabled to continue as
> mysqld was configured with --with-aria-tmp-tables
> 140724 12:23:01 [ERROR] Aborting
> 140724 12:23:01 [Note] /usr/libexec/mysqld: Shutdown complete
> 140724 12:23:03 [Note] Plugin 'InnoDB' is disabled.
> Cannot find checkpoint record at LSN (1,0x35767)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 246 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20140725/dbf23603/attachment.sig>
More information about the systemd-devel
mailing list