[systemd-devel] How do I monitor for service exit , service failure and service start using DBus ?
Pradeepa Kumar
cdpradeepa at gmail.com
Sun Jul 12 20:49:27 PDT 2015
Hi all,
2nd try.
I did not get any response for my queries.
I am blocked on my work due to these queries.
Any comments are highly welcome and appreciated
Thanks ,
Prashant
On Fri, Jul 10, 2015 at 11:36 AM, Pradeepa Kumar <cdpradeepa at gmail.com>
wrote:
>
> Hello systemd-experts,
>
> I am new to systemd and dbus.
>
> I am writing a daemon which starts, stops and monitor services and I am
> using dbus to interface with systemd. I send messages on dbus to start or
> stop the service and this is working well. I want to achieve the following:
>
>
> a) Monitor service exit using the systemd dbus interface.
>
> b) Monitor when the service has entered a failed state (the app has been
> restarted n times in m seconds).
>
>
>
> I am trying to understand what is the right mechanism to implement this
> using the systemd dbus interface, I have tried a few things clearly I don’t
> understand how this works, any help greatly appreciated.
>
>
>
> In my first attempt I subscribed to JobRemoved signal from systemd as
> explained here (http://www.freedesktop.org/wiki/Software/systemd/dbus/).
> I was able to figure out when the service failed by looking at the result
> string.
>
>
>
> ———
>
> JobNew() and JobRemoved() are sent out each time a new job is queued or
> dequeued. Both signals take the numeric job ID, the bus path and the
> primary unit name for this job as argument. JobRemoved() also includes a
> result string, being one of done, canceled, timeout, failed, dependency,
> skipped. done indicates successful execution of a job. canceled indicates
> that a job has been canceled (via CancelJob() above) before it finished
> execution (this doesn't necessarily mean though that the job operation is
> actually cancelled too, see above). timeout indicates that the job timeout
> was reached. failed indicates that the job failed. dependency indicates
> that a job this job has been depending on failed and the job hence has been
> removed too. skipped indicates that a job was skipped because it didn't
> apply to the units current state.
>
> ——
>
>
>
> I soon realized that I was getting spurious JobRemoved signals.
>
>
>
> I am now trying to achieve (1) and (2) by subscribing to PropertiesChanged
> signal and I have a few questions here:
>
>
>
> 1) When I get the PropertiesChanged, I query the SubState property to get
> the running/stop state of the service. Is querying the SubState property
> the right way to get the service status? If SubState value is “running”
> then I infer that app is running and its any other value, I infer the app
> is down. I am not relying on ActiveState because I see that for some
> signal, ActiveState is “active” but SubState as “exited”.
>
> Is this approach correct?
>
>
>
> 2) When a service exits, I get multiple PropertiesChanged signal.
>
>
>
> kill -9 <myservice> will transition myservice to “stop”,
> “auto-restart” and then “running” SubState
>
> systemctl restart myservice will transition myservice to “stop”,
> “stop-sigterm” and then “running” SubState.
>
> systemctl stop myservice will transition myservice to “stop” and then
> “stop-sigterm” SubState.
>
>
>
> What does “stop” and “stop-term” mean here and why are there 2 signals to
> indicate stop? Is “stop” a good indicator that the service has stopped?
>
> In my client, I can cache the services and their states.
>
>
> 3)How do I get to know when an application has failed (failed here means
> when systemd will not restart application again after n app exits in m
> seconds).
>
> When I was using JobRemoved, I used the value of “failed” in “result”
> parameter in JobRemoved signal, was this the correct indicator to determine
> service failure?
>
>
>
> 4) How do I get to know when an service has started? When I issue
> “systemctl start myservice”, I do not getany PropertiesChanged signal, I
> receive JobNew, JobRemoved and then UnitNew signals.
>
>
>
> I thought of subscribing to UnitNew signal. But I also get these multiple
> (i.e 2) UnitNew and UnitRemoved signal when I do ‘systemctl stop
> myservice’ too. Why do I get UnitNew when a service is being stopped?
>
>
>
> From my research I understand that requesting the properties of an
> unloaded unit will cause systemd to send a pair of UnitNew/UnitRemoved
> signals and this may lead to infinite loop.
>
> How do I fix this?
>
>
>
> Am I solving the requirement (a) and (b) correctly or should I be using a
> different mechanism to achieve (a) and (b).
>
>
>
> Appreciate your help on above queries
>
>
>
> Thanks
>
> Prashant
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20150713/3c6c3f96/attachment-0001.html>
More information about the systemd-devel
mailing list