[systemd-devel] Pacemaker detecting existing processes question (Was: indirectly related - pacemaker service)

Jan Pokorný jpokorny at fedoraproject.org
Thu May 30 07:42:59 UTC 2019


[forwarding to respective upstream list, this has little to do with
systemd, I suggest following up only there, detaching from systemd ML]

On 29/05/19 17:23 +0100, lejeczek wrote:
> something I was hoping one expert could shed bit more light onto - I
> have a pacemaker cluster composed of three nodes. One one always has a
> problem with pacemaker - it's tools would say thing like:
> 
> $ crm_mon --one-shot
> Connection to cluster failed: Transport endpoint is not connected
> $ pcs status --all
> Error: cluster is not currently running on this node
> 
> but systemd reports relevant demons as up and running with on tiny
> exceptions! On "working" nodes it's:
> 
> $ systemctl status -l pacemaker
> ● pacemaker.service - Pacemaker High Availability Cluster Manager
>    Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled;
> vendor preset: disabled)
>    Active: active (running) since Fri 2019-05-10 15:39:40 BST; 2 weeks 5
> days ago
>      Docs: man:pacemakerd
>           
> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html
> 
>  Main PID: 28664 (pacemakerd)
>    CGroup: /system.slice/pacemaker.service
>            ├─  28664 /usr/sbin/pacemakerd -f
>            ├─  28670 /usr/libexec/pacemaker/cib
>            ├─  28671 /usr/libexec/pacemaker/stonithd
>            ├─  28672 /usr/libexec/pacemaker/lrmd
>            ├─  28673 /usr/libexec/pacemaker/attrd
>            ├─  28674 /usr/libexec/pacemaker/pengine
>            ├─  28676 /usr/libexec/pacemaker/crmd
>            ├─1503698 /bin/sh /usr/lib/ocf/resource.d/heartbeat/LVM monitor
>            ├─1503717 /bin/sh /usr/lib/ocf/resource.d/heartbeat/LVM monitor
>            ├─1503718 vgs -o tags --noheadings equalLogic-2.2
>            └─1503719 tr -d  
> 
> but on that one single failing node:
> 
> $ systemctl status -l pacemaker.service 
> ● pacemaker.service - Pacemaker High Availability Cluster Manager
>    Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled;
> vendor preset: disabled)
>    Active: active (running) since Wed 2019-05-29 17:08:40 BST; 2min 19s ago
>      Docs: man:pacemakerd
>           
> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html
> 
>  Main PID: 48729 (pacemakerd)
>     Tasks: 1
>    Memory: 3.3M
>    CGroup: /system.slice/pacemaker.service
>            └─48729 /usr/sbin/pacemakerd -f
>  
> May 29 17:08:41 rider.private pacemakerd[48729]:   notice: Tracking
> existing cib process (pid=39234)
> May 29 17:08:41 rider.private pacemakerd[48729]:   notice: Tracking
> existing stonithd process (pid=39235)
> May 29 17:08:41 rider.private pacemakerd[48729]:   notice: Tracking
> existing lrmd process (pid=39236)
> May 29 17:08:41 rider.private pacemakerd[48729]:   notice: Tracking
> existing attrd process (pid=39238)
> May 29 17:08:41 rider.private pacemakerd[48729]:   notice: Tracking
> existing pengine process (pid=39240)
> May 29 17:08:41 rider.private pacemakerd[48729]:   notice: Tracking
> existing crmd process (pid=39241)
> May 29 17:08:41 rider.private pacemakerd[48729]:   notice: Quorum acquired
> 
> You can clearly see the difference, right? Systems are virtually
> identical, same Dell's server model, same Centos 7.6 and packages from
> same default repos.
> 
> Does that difference between systemds status for pacemaker signify anything?

As those notices say, you attempted to start pacemaker service in an
unexpected situation all the subdaemons were already running for some
hard to guess reason (manually invoked pacemakerd crashing, leaving these
orphans behind could be one such reason; I presume you haven't run
pacemaker daemons on your own).  Another thing, you mention baremetal
deployment, but the non-well-isolated containers with such processes
running within could cause a similar problem.

Do you use CentOS provided packages or did you build from source?

In any case, let's continue at ClusterLabs ML, systemd folks cannot
help here, systemctl status just summarizes what can be derived
from the log messages (that also happen to be amongst the more recent
logged stuff as of that time, hence listed as well above) already.

-- 
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/systemd-devel/attachments/20190530/43b1e273/attachment.sig>


More information about the systemd-devel mailing list