[systemd-devel] race conditions after SIGTERM

Reindl Harald h.reindl at thelounge.net
Sat Jul 26 05:46:05 PDT 2014


after that happend a few minutes ago systemd on F19:
https://bugzilla.redhat.com/show_bug.cgi?id=1123557

since the "sleep 1" was enough over months and now "sleep 6"
between "systemctl stop" and rsync is also not relieable my
only conclusion is that systemd don't care about the still
running MAINPID of "type=simple" and in case of a existing
"ExecStopPost" the logic is correct

looks like "ExecStopPost" is correctly executed after the
MAINPID finished and without systemcl returns immediately

i don't know if that affects F20/F21 too because i can't
just upgrade production and that race happens once or
twice a day and is hence not relieable to reproduce

Am 25.07.2014 12:11, schrieb Reindl Harald:
> and it happended again
> 
> how can it be that if i have running mysqld-instances as
> Type=simple directly starting mysqld that it is not safe
> after "systemctl stop" returns to rsync the datadirs
> 
> i had over a long time in any mysqld-unit
> ExecStopPost=/usr/bin/sleep 1
> pretty sure because that happened in the past
> 
> but that is a really dirty solution
> is 1 second enough, if not 2 or better 5
> i have now a "sleep 6" in the backup scripts - not beautiful
> 
> after "systemctl stop" returns i expect that the MAINPID is
> *really* done, has finished it's cleanups and written data
> to disk and that systemd especially in case of a non-forking
> daemon knows trustable when it is finished
> 
> Am 24.07.2014 13:24, schrieb Reindl Harald:
>> maybe that's a systemd thing
>>
>> i know Fedora 19 has not a recent systemd but the question
>> remains if systemctl in case of "Type=simple" may act the same
>> way while stop a service as for starting - send the SIGTERM
>> and immediately return while the binary still writes data
>>
>> that could explain race conditions like below
>>
>> * stop mysqld instance 1
>> * stop mysqld instance 2
>> * the services still flush data but "systemctl" already returned
>> * rsync both datadir
>> * corruption
>>
>> -------- Original-Nachricht --------
>> Betreff: race conditions after SIGTERM
>> Datum: Thu, 24 Jul 2014 12:42:51 +0200
>> Von: Reindl Harald <h.reindl at thelounge.net>
>> An: Mailing-List mariadb <"maria-discuss"@lists.launchpad.net>
>>
>> how can that script lead to a race condition where files
>> are not fully written to disk? that never happens if the
>> systemd-unit for the replication instance has
>> "ExecStopPost=/usr/bin/sleep 1" and waits a while
>>
>> my only explaination is that mysqld returns after the SIGTERM
>> from systemd before all data are completly written in some racy
>> situations and so rsynced incompletly to the datadir of the
>> other instance
>> _________________________________________________________
>>
>> #!/bin/bash
>> systemctl stop replication.service
>> systemctl stop mysqld.service
>> rsync $RSYNC_PARAMS /mysql_replication/ /mysql_data/
>> systemctl start replication.service
>> systemctl start mysqld.service
>> _________________________________________________________
>>
>> [Unit]
>> Description=MariaDB Replication
>>
>> [Service]
>> Type=simple
>> PIDFile=/run/mysqld/mysqld_replication.pid
>> ExecStart=/usr/libexec/mysqld --defaults-file=/etc/my-replication.cnf --pid-file=/run/mysqld/mysqld_replication.pid
>> --socket=/var/lib/mysql/mysql_replication.sock --open-files-limit=30000 --basedir=/usr --user=mysql
>> Environment="LANG=en_GB.UTF-8"
>> Restart=always
>> RestartSec=1
>> _________________________________________________________
>>
>> 140724 12:22:59 [Note] /usr/libexec/mysqld: Shutdown complete
>> 140724 12:23:01 [Note] Plugin 'InnoDB' is disabled.
>> Cannot find checkpoint record at LSN (1,0x35767)
>> 140724 12:23:01 [ERROR] mysqld: Aria recovery failed. Please run aria_chk -r on all Aria tables and delete all
>> aria_log.######## files
>> 140724 12:23:01 [ERROR] Plugin 'Aria' init function returned error.
>> 140724 12:23:01 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.
>> 140724 12:23:01 [Note] Plugin 'FEDERATED' is disabled.
>> 140724 12:23:01 [Note] Plugin 'FEEDBACK' is disabled.
>> 140724 12:23:01 [ERROR] Aria engine is not enabled or did not start. The Aria engine must be enabled to continue as
>> mysqld was configured with --with-aria-tmp-tables
>> 140724 12:23:01 [ERROR] Aborting
>> 140724 12:23:01 [Note] /usr/libexec/mysqld: Shutdown complete
>> 140724 12:23:03 [Note] Plugin 'InnoDB' is disabled.
>> Cannot find checkpoint record at LSN (1,0x35767)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 246 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20140726/a7dfcf98/attachment.sig>


More information about the systemd-devel mailing list