[systemd-devel] OnFailure=

Andrei Borzenkov arvidjaar at gmail.com
Thu Mar 8 06:03:38 UTC 2018


08.03.2018 02:37, Jakob Schürz пишет:
> Hi there!
> 
> I build a test-unit
> 
> # cat test at .service
> [Unit]
> Description=Testservice notification
> OnFailure=notification-telegram@%n.service
> 
> [Service]
> Type=simple
> Restart=on-failure
> #RestartSec=2
> ExecStart=/bin/%i
> SyslogIdentifier=test@%i.service
> StartLimitBurst=5
> StartLimitInterval=10
> 
> 
> And the notification-Unit notification-telegram@%n.service
> 
> # cat notification-telegram at .service
> [Unit]
> Description=Send failure-notification about %i to telegram
> 
> [Service]
> User=jakob
> ExecStart=/bin/bash -c "/usr/local/bin/ntfy -b telegram send
> \"FAILED\n$(systemctl status %i)\""
> 
> When i start the Test-Unit with systemctl start test at false i get 5
> Messages in telegram...
> 
> The log is:
> Mär 08 00:31:53 aldebaran systemd[1]: Started Testservice notification.
> Mär 08 00:31:53 aldebaran systemd[1]: test at false.service: Main process
> exited, code=exited, status=1/FAILURE
> Mär 08 00:31:53 aldebaran systemd[1]: test at false.service: Failed with
> result 'exit-code'.
> Mär 08 00:31:53 aldebaran systemd[1]: test at false.service: Triggering
> OnFailure= dependencies.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Service
> hold-off time over, scheduling restart.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Scheduled
> restart job, restart counter is at 1.
> Mär 08 00:31:54 aldebaran systemd[1]: Stopped Testservice notification.
> Mär 08 00:31:54 aldebaran systemd[1]: Started Testservice notification.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Main process
> exited, code=exited, status=1/FAILURE
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Failed with
> result 'exit-code'.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Triggering
> OnFailure= dependencies.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Service
> hold-off time over, scheduling restart.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Scheduled
> restart job, restart counter is at 2.
> Mär 08 00:31:54 aldebaran systemd[1]: Stopped Testservice notification.
> Mär 08 00:31:54 aldebaran systemd[1]: Started Testservice notification.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Main process
> exited, code=exited, status=1/FAILURE
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Failed with
> result 'exit-code'.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Triggering
> OnFailure= dependencies.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Service
> hold-off time over, scheduling restart.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Scheduled
> restart job, restart counter is at 3.
> Mär 08 00:31:54 aldebaran systemd[1]: Stopped Testservice notification.
> Mär 08 00:31:54 aldebaran systemd[1]: Started Testservice notification.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Main process
> exited, code=exited, status=1/FAILURE
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Failed with
> result 'exit-code'.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Triggering
> OnFailure= dependencies.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Service
> hold-off time over, scheduling restart.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Scheduled
> restart job, restart counter is at 4.
> Mär 08 00:31:54 aldebaran systemd[1]: Stopped Testservice notification.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Start request
> repeated too quickly.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Failed with
> result 'exit-code'.
> Mär 08 00:31:54 aldebaran systemd[1]: Failed to start Testservice
> notification.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Triggering
> OnFailure= dependencies.
> 
> 
> You see, the Unit from OnFailure= is called 5 times, not at the "Failed
> to start Testservice notification"-time.
> 
> The man-page says:
> 
> OnFailure=
>            A space-separated list of one or more units that are
> activated when this unit enters the "failed" state. A service unit using
> Restart= enters the failed state only after the
>            start limits are reached.
> 

This is apparently wrong, because service briefly goes via "failed"
state every time it fails. It is true that if Restart= is set it
immediately follows by "activating" state again, but OnFailure actions
are still taken.

So from end-user perspective unit indeed remains "failed" only when
limits are reached, but internally it does transition via "failed" state
every time.


> 
> But in this testcase, the unit listet in OnFailure is called every time,
> the unit failes, restarts again fails again, and after 5 times
> (=StartLimitBurst), the unit falls into failed state... Here should be
> the only one time, where "OnFailure=" is hit...
> 
> My systemd-Version is 237-3 from debian.
> 
> Should i file a Bug in bugs.freedesktop.org?
> 


You should create issue on github, this this where primary bug tracker
is today:

https://github.com/systemd/systemd/


More information about the systemd-devel mailing list