[systemd-devel] How do I monitor for service exit , service failure and service start using DBus ?

Pradeepa Kumar cdpradeepa at gmail.com
Thu Jul 9 23:06:09 PDT 2015


Hello systemd-experts,

I am new to systemd and dbus.

I am writing a daemon which starts, stops and monitor services and I am
using dbus to interface with systemd. I send messages on dbus to start or
stop the service and this is working well. I want to achieve the following:


a) Monitor service exit using the systemd dbus interface.

b) Monitor when the service has entered a failed state (the app has been
restarted n times in m seconds).



I am trying to understand what is the right mechanism to implement this
using the systemd dbus interface, I have tried a few things clearly I don’t
understand how this works, any help greatly appreciated.



In my first attempt I subscribed to JobRemoved signal from systemd as
explained here (http://www.freedesktop.org/wiki/Software/systemd/dbus/). I
was able to figure out when the service failed by looking at the result
string.



———

JobNew() and JobRemoved() are sent out each time a new job is queued or
dequeued. Both signals take the numeric job ID, the bus path and the
primary unit name for this job as argument. JobRemoved() also includes a
result string, being one of done, canceled, timeout, failed, dependency,
skipped. done indicates successful execution of a job. canceled indicates
that a job has been canceled (via CancelJob() above) before it finished
execution (this doesn't necessarily mean though that the job operation is
actually cancelled too, see above). timeout indicates that the job timeout
was reached. failed indicates that the job failed. dependency indicates
that a job this job has been depending on failed and the job hence has been
removed too. skipped indicates that a job was skipped because it didn't
apply to the units current state.

——



I soon realized that I was getting spurious JobRemoved signals.



I am now trying to achieve (1) and (2) by subscribing to PropertiesChanged
signal and I have a few questions here:



1) When I get the PropertiesChanged, I query the SubState property to get
the running/stop state of the service. Is querying the SubState property
the right way to get the service status? If SubState value is “running”
then I infer that app is running and its any other value, I infer the app
is down. I am not relying on ActiveState because I see that for some
signal, ActiveState is “active” but SubState as “exited”.

 Is this approach correct?



2) When a service exits, I get multiple PropertiesChanged signal.



   kill -9  <myservice> will transition myservice to  “stop”,
“auto-restart” and then “running” SubState

   systemctl restart myservice will transition myservice to   “stop”,
“stop-sigterm” and then  “running” SubState.

   systemctl stop myservice will transition myservice to   “stop” and then
 “stop-sigterm” SubState.



What does “stop” and “stop-term” mean here and why are there 2 signals to
indicate stop? Is “stop” a good indicator that the service has stopped?

In my client, I can cache the services and their states.


3)How do I get to know when an application has failed (failed here means
when systemd will not restart application again after n app exits in m
seconds).

When I was using JobRemoved, I used the value of “failed” in “result”
parameter in JobRemoved signal, was this the correct indicator to determine
service failure?



4) How do I get to know when an service has started? When I issue
“systemctl start myservice”, I do not getany PropertiesChanged signal, I
receive JobNew, JobRemoved and then UnitNew signals.



I thought of subscribing to UnitNew signal. But I also get these multiple
(i.e 2) UnitNew and UnitRemoved signal when I do ‘systemctl stop
 myservice’ too. Why do I get UnitNew when a service is being stopped?



>From my research I understand that requesting the properties of an unloaded
unit will cause systemd to send a pair of UnitNew/UnitRemoved signals and
this may lead to infinite loop.

How do I fix this?



Am I solving the requirement (a) and (b) correctly or should I be using a
different mechanism to achieve (a) and (b).



Appreciate your help on above queries



Thanks

Prashant
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20150710/1bb256c0/attachment-0001.html>


More information about the systemd-devel mailing list