[systemd-devel] OnFailure=
Andrei Borzenkov
arvidjaar at gmail.com
Thu Mar 8 06:03:38 UTC 2018
08.03.2018 02:37, Jakob Schürz пишет:
> Hi there!
>
> I build a test-unit
>
> # cat test at .service
> [Unit]
> Description=Testservice notification
> OnFailure=notification-telegram@%n.service
>
> [Service]
> Type=simple
> Restart=on-failure
> #RestartSec=2
> ExecStart=/bin/%i
> SyslogIdentifier=test@%i.service
> StartLimitBurst=5
> StartLimitInterval=10
>
>
> And the notification-Unit notification-telegram@%n.service
>
> # cat notification-telegram at .service
> [Unit]
> Description=Send failure-notification about %i to telegram
>
> [Service]
> User=jakob
> ExecStart=/bin/bash -c "/usr/local/bin/ntfy -b telegram send
> \"FAILED\n$(systemctl status %i)\""
>
> When i start the Test-Unit with systemctl start test at false i get 5
> Messages in telegram...
>
> The log is:
> Mär 08 00:31:53 aldebaran systemd[1]: Started Testservice notification.
> Mär 08 00:31:53 aldebaran systemd[1]: test at false.service: Main process
> exited, code=exited, status=1/FAILURE
> Mär 08 00:31:53 aldebaran systemd[1]: test at false.service: Failed with
> result 'exit-code'.
> Mär 08 00:31:53 aldebaran systemd[1]: test at false.service: Triggering
> OnFailure= dependencies.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Service
> hold-off time over, scheduling restart.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Scheduled
> restart job, restart counter is at 1.
> Mär 08 00:31:54 aldebaran systemd[1]: Stopped Testservice notification.
> Mär 08 00:31:54 aldebaran systemd[1]: Started Testservice notification.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Main process
> exited, code=exited, status=1/FAILURE
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Failed with
> result 'exit-code'.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Triggering
> OnFailure= dependencies.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Service
> hold-off time over, scheduling restart.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Scheduled
> restart job, restart counter is at 2.
> Mär 08 00:31:54 aldebaran systemd[1]: Stopped Testservice notification.
> Mär 08 00:31:54 aldebaran systemd[1]: Started Testservice notification.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Main process
> exited, code=exited, status=1/FAILURE
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Failed with
> result 'exit-code'.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Triggering
> OnFailure= dependencies.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Service
> hold-off time over, scheduling restart.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Scheduled
> restart job, restart counter is at 3.
> Mär 08 00:31:54 aldebaran systemd[1]: Stopped Testservice notification.
> Mär 08 00:31:54 aldebaran systemd[1]: Started Testservice notification.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Main process
> exited, code=exited, status=1/FAILURE
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Failed with
> result 'exit-code'.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Triggering
> OnFailure= dependencies.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Service
> hold-off time over, scheduling restart.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Scheduled
> restart job, restart counter is at 4.
> Mär 08 00:31:54 aldebaran systemd[1]: Stopped Testservice notification.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Start request
> repeated too quickly.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Failed with
> result 'exit-code'.
> Mär 08 00:31:54 aldebaran systemd[1]: Failed to start Testservice
> notification.
> Mär 08 00:31:54 aldebaran systemd[1]: test at false.service: Triggering
> OnFailure= dependencies.
>
>
> You see, the Unit from OnFailure= is called 5 times, not at the "Failed
> to start Testservice notification"-time.
>
> The man-page says:
>
> OnFailure=
> A space-separated list of one or more units that are
> activated when this unit enters the "failed" state. A service unit using
> Restart= enters the failed state only after the
> start limits are reached.
>
This is apparently wrong, because service briefly goes via "failed"
state every time it fails. It is true that if Restart= is set it
immediately follows by "activating" state again, but OnFailure actions
are still taken.
So from end-user perspective unit indeed remains "failed" only when
limits are reached, but internally it does transition via "failed" state
every time.
>
> But in this testcase, the unit listet in OnFailure is called every time,
> the unit failes, restarts again fails again, and after 5 times
> (=StartLimitBurst), the unit falls into failed state... Here should be
> the only one time, where "OnFailure=" is hit...
>
> My systemd-Version is 237-3 from debian.
>
> Should i file a Bug in bugs.freedesktop.org?
>
You should create issue on github, this this where primary bug tracker
is today:
https://github.com/systemd/systemd/
More information about the systemd-devel
mailing list