[ClusterLabs] Nodes see each other as OFFLINE - fence agent (fence_pcmk) may not be working properly on RHEL 6.5
avinash shankar
avinash.shankar.1 at gmail.com
Fri Dec 16 08:46:57 EST 2016
Hello team,
I am a newbie in pacemaker and corosync cluster.
I am facing trouble with fence_agent on RHEL 6.5
I have installed pcs, pacemaker, corosync, cman on RHEL 6.5 on two virtual
nodes (libvirt) cluster.
SELINUX and firewall is completely disabled.
# yum list installed | egrep 'pacemaker|corosync|cman|fence'
cman.x86_64 3.0.12.1-78.el6
@rhel-ha-for-rhel-6-server-rpms
corosync.x86_64 1.4.7-5.el6
@rhel-ha-for-rhel-6-server-rpms
corosynclib.x86_64 1.4.7-5.el6
@rhel-ha-for-rhel-6-server-rpms
fence-agents.x86_64 4.0.15-12.el6
@rhel-6-server-rpms
fence-virt.x86_64 0.2.3-19.el6
@rhel-ha-for-rhel-6-server-eus-rpms
pacemaker.x86_64 1.1.14-8.el6_8.2
@rhel-ha-for-rhel-6-server-rpms
pacemaker-cli.x86_64 1.1.14-8.el6_8.2
@rhel-ha-for-rhel-6-server-rpms
pacemaker-cluster-libs.x86_64 1.1.14-8.el6_8.2
@rhel-ha-for-rhel-6-server-rpms
pacemaker-libs.x86_64 1.1.14-8.el6_8.2
@rhel-ha-for-rhel-6-server-rpms
I bring up cluster using pcs cluster start --all
also done pcs property set stonith-enabled=false
Below is the status
---------------------------
# pcs status
Cluster name: roamclus
Last updated: Fri Dec 16 18:54:40 2016 Last change: Fri Dec 16
17:44:50 2016 by root via cibadmin on cnode1
Stack: cman
Current DC: NONE
2 nodes and 2 resources configured
Online: [ cnode1 ]
OFFLINE: [ cnode2 ]
Full list of resources:
PCSD Status:
cnode1: Online
cnode2: Online
---------------------------
Same kind of output is observed on other node = cnode2
So nodes see each other as OFFLINE.
Expected result is Online: [ cnode1 cnode2 ]
I did same packages installation on RHEL 6.8 and when I am starting the
cluster,
it shows both nodes ONLINE from each other.
I need to resolve this such that on RHEL 6.5 nodes when we start cluster by
default
both nodes should display each others status as online.
----------------------------------------------
Below is the /etc/cluster/cluster.conf
<cluster config_version="9" name="roamclus">
<fence_daemon/>
<clusternodes>
<clusternode name="cnode1" nodeid="1" votes="1">
<fence>
<method name="pcmk-method">
<device name="pcmk-redirect" port="cnode1"/>
</method>
</fence>
</clusternode>
<clusternode name="cnode2" nodeid="2" votes="1">
<fence>
<method name="pcmk-method">
<device name="pcmk-redirect" port="cnode2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman broadcast="no" expected_votes="1" transport="udp" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_pcmk" name="pcmk-redirect"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>
----------------------------------------------
# cat /var/lib/pacemaker/cib/cib.xml
<cib crm_feature_set="3.0.10" validate-with="pacemaker-2.4" epoch="15"
num_updates="0" admin_epoch="0" cib-last-written="Fri Dec 16 18:57:10 2016"
update-origin="cnode1" update-client="cibadmin" update-user="root"
have-quorum="1" dc-uuid="cnode1">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-have-watchdog"
name="have-watchdog" value="false"/>
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.14-8.el6_8.2-70404b0"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="cman"/>
<nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="false"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="cnode1" uname="cnode1"/>
<node id="cnode2" uname="cnode2"/>
</nodes>
<resources/>
<constraints/>
</configuration>
</cib>
------------------------------------------------
/var/log/messages have below contents :
Dec 15 20:29:43 cnode2 kernel: DLM (built Oct 26 2016 10:26:08) installed
Dec 15 20:29:46 cnode2 corosync[2464]: [MAIN ] Corosync Cluster Engine
('1.4.7'): started and ready to provide service.
Dec 15 20:29:46 cnode2 corosync[2464]: [MAIN ] Corosync built-in
features: nss dbus rdma snmp
Dec 15 20:29:46 cnode2 corosync[2464]: [MAIN ] Successfully read config
from /etc/cluster/cluster.conf
Dec 15 20:29:46 cnode2 corosync[2464]: [MAIN ] Successfully parsed cman
config
Dec 15 20:29:46 cnode2 corosync[2464]: [TOTEM ] Initializing transport
(UDP/IP Multicast).
Dec 15 20:29:46 cnode2 corosync[2464]: [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Dec 15 20:29:46 cnode2 corosync[2464]: [TOTEM ] The network interface
[10.10.18.138] is now up.
Dec 15 20:29:46 cnode2 corosync[2464]: [QUORUM] Using quorum provider
quorum_cman
Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded:
corosync cluster quorum service v0.1
Dec 15 20:29:46 cnode2 corosync[2464]: [CMAN ] CMAN 3.0.12.1 (built Feb
1 2016 07:06:19) started
Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded:
corosync CMAN membership service 2.90
Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded:
openais checkpoint service B.01.01
Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded:
corosync extended virtual synchrony service
Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded:
corosync configuration service
Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded:
corosync cluster closed process group service v1.01
Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded:
corosync cluster config database access v1.01
Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded:
corosync profile loading service
Dec 15 20:29:46 cnode2 corosync[2464]: [QUORUM] Using quorum provider
quorum_cman
Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded:
corosync cluster quorum service v0.1
Dec 15 20:29:46 cnode2 corosync[2464]: [MAIN ] Compatibility mode set to
whitetank. Using V1 and V2 of the synchronization engine.
Dec 15 20:29:46 cnode2 corosync[2464]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Dec 15 20:29:46 cnode2 corosync[2464]: [CMAN ] quorum regained, resuming
activity
Dec 15 20:29:46 cnode2 corosync[2464]: [QUORUM] This node is within the
primary component and will provide service.
Dec 15 20:29:46 cnode2 corosync[2464]: [QUORUM] Members[1]: 2
Dec 15 20:29:46 cnode2 corosync[2464]: [QUORUM] Members[1]: 2
Dec 15 20:29:46 cnode2 corosync[2464]: [CPG ] chosen downlist: sender
r(0) ip(10.10.18.138) ; members(old:0 left:0)
Dec 15 20:29:46 cnode2 corosync[2464]: [MAIN ] Completed service
synchronization, ready to provide service.
Dec 15 20:29:50 cnode2 fenced[2529]: fenced 3.0.12.1 started
Dec 15 20:29:50 cnode2 dlm_controld[2543]: dlm_controld 3.0.12.1 started
Dec 15 20:29:51 cnode2 gfs_controld[2606]: gfs_controld 3.0.12.1 started
Dec 15 20:30:36 cnode2 pacemaker: Starting Pacemaker Cluster Manager
Dec 15 20:30:36 cnode2 pacemakerd[2767]: notice: Additional logging
available in /var/log/pacemaker.log
Dec 15 20:30:36 cnode2 pacemakerd[2767]: notice: Switching to
/var/log/cluster/corosync.log
Dec 15 20:30:36 cnode2 pacemakerd[2767]: notice: Additional logging
available in /var/log/cluster/corosync.log
Dec 15 20:30:36 cnode2 pacemakerd[2767]: notice: Starting Pacemaker
1.1.14-8.el6_8.2 (Build: 70404b0): generated-manpages agent-manpages
ascii-docs ncurses libqb-logging libqb-ipc nagios corosync-plugin cman acls
Dec 15 20:30:36 cnode2 pacemakerd[2767]: notice: Membership 4: quorum
acquired
Dec 15 20:30:36 cnode2 pacemakerd[2767]: notice: cman_event_callback:
Node cnode2[2] - state is now member (was (null))
Dec 15 20:30:36 cnode2 cib[2773]: notice: Additional logging available in
/var/log/cluster/corosync.log
Dec 15 20:30:36 cnode2 cib[2773]: notice: Using new config location:
/var/lib/pacemaker/cib
Dec 15 20:30:36 cnode2 cib[2773]: warning: Could not verify cluster
configuration file /var/lib/pacemaker/cib/cib.xml: No such file or
directory (2)
Dec 15 20:30:36 cnode2 cib[2773]: warning: Primary configuration corrupt
or unusable, trying backups in /var/lib/pacemaker/cib
Dec 15 20:30:36 cnode2 cib[2773]: warning: Continuing with an empty
configuration.
Dec 15 20:30:36 cnode2 stonith-ng[2774]: notice: Additional logging
available in /var/log/cluster/corosync.log
Dec 15 20:30:36 cnode2 stonith-ng[2774]: notice: Connecting to cluster
infrastructure: cman
Dec 15 20:30:36 cnode2 attrd[2776]: notice: Additional logging available
in /var/log/cluster/corosync.log
Dec 15 20:30:36 cnode2 attrd[2776]: notice: Connecting to cluster
infrastructure: cman
Dec 15 20:30:36 cnode2 stonith-ng[2774]: notice: crm_update_peer_proc:
Node cnode2[2] - state is now member (was (null))
Dec 15 20:30:36 cnode2 pengine[2777]: notice: Additional logging
available in /var/log/cluster/corosync.log
Dec 15 20:30:36 cnode2 lrmd[2775]: notice: Additional logging available
in /var/log/cluster/corosync.log
Dec 15 20:30:36 cnode2 attrd[2776]: notice: crm_update_peer_proc: Node
cnode2[2] - state is now member (was (null))
Dec 15 20:30:36 cnode2 crmd[2778]: notice: Additional logging available
in /var/log/cluster/corosync.log
Dec 15 20:30:36 cnode2 crmd[2778]: notice: CRM Git Version:
1.1.14-8.el6_8.2 (70404b0)
Dec 15 20:30:36 cnode2 cib[2773]: notice: Connecting to cluster
infrastructure: cman
Dec 15 20:30:36 cnode2 attrd[2776]: notice: Starting mainloop...
Dec 15 20:30:36 cnode2 cib[2773]: notice: crm_update_peer_proc: Node
cnode2[2] - state is now member (was (null))
Dec 15 20:30:36 cnode2 cib[2782]: warning: Could not verify cluster
configuration file /var/lib/pacemaker/cib/cib.xml: No such file or
directory (2)
Dec 15 20:30:37 cnode2 stonith-ng[2774]: notice: Watching for stonith
topology changes
Dec 15 20:30:37 cnode2 crmd[2778]: notice: Connecting to cluster
infrastructure: cman
Dec 15 20:30:37 cnode2 crmd[2778]: notice: Membership 4: quorum acquired
Dec 15 20:30:37 cnode2 crmd[2778]: notice: cman_event_callback: Node
cnode2[2] - state is now member (was (null))
Dec 15 20:30:37 cnode2 crmd[2778]: notice: The local CRM is operational
Dec 15 20:30:37 cnode2 crmd[2778]: notice: State transition S_STARTING ->
S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
Dec 15 20:30:42 cnode2 fenced[2529]: fencing node cnode1
Dec 15 20:30:42 cnode2 fence_pcmk[2805]: Requesting Pacemaker fence cnode1
(reset)
Dec 15 20:30:42 cnode2 stonith-ng[2774]: notice: Client
stonith_admin.cman.2806.6d791bd8 wants to fence (reboot) 'cnode1' with
device '(any)'
Dec 15 20:30:42 cnode2 stonith-ng[2774]: notice: Initiating remote
operation reboot for cnode1: c398b8b7-6ba1-4068-a174-547bac72476d (0)
Dec 15 20:30:42 cnode2 stonith-ng[2774]: notice: Couldn't find anyone to
fence (reboot) cnode1 with any device
Dec 15 20:30:42 cnode2 stonith-ng[2774]: error: Operation reboot of
cnode1 by <no-one> for stonith_admin.cman.2806 at cnode2.c398b8b7: No such
device
Dec 15 20:30:42 cnode2 crmd[2778]: notice: Peer cnode1 was not terminated
(reboot) by <anyone> for cnode2: No such device
(ref=c398b8b7-6ba1-4068-a174-547bac72476d) by client stonith_admin.cman.2806
Dec 15 20:30:42 cnode2 fence_pcmk[2805]: Call to fence cnode1 (reset)
failed with rc=237
Dec 15 20:30:42 cnode2 fenced[2529]: fence cnode1 dev 0.0 agent fence_pcmk
result: error from agent
Dec 15 20:30:42 cnode2 fenced[2529]: fence cnode1 failed
Dec 15 20:30:45 cnode2 fenced[2529]: fencing node cnode1
Dec 15 20:30:45 cnode2 fence_pcmk[2825]: Requesting Pacemaker fence cnode1
(reset)
Dec 15 20:30:45 cnode2 stonith-ng[2774]: notice: Client
stonith_admin.cman.2826.f2c208fe wants to fence (reboot) 'cnode1' with
device '(any)'
Dec 15 20:30:45 cnode2 stonith-ng[2774]: notice: Initiating remote
operation reboot for cnode1: b5df8517-d8a7-4f33-8cd2-d41c512d13ae (0)
Dec 15 20:30:45 cnode2 stonith-ng[2774]: notice: Couldn't find anyone to
fence (reboot) cnode1 with any device
Dec 15 20:30:45 cnode2 stonith-ng[2774]: error: Operation reboot of
cnode1 by <no-one> for stonith_admin.cman.2826 at cnode2.b5df8517: No such
device
Dec 15 20:30:48 cnode2 crmd[2778]: notice: Peer cnode1 was not terminated
(reboot) by <anyone> for cnode2: No such device
(ref=aff3eb58-4777-4fca-9802-eb084dc56ad4) by client stonith_admin.cman.2846
Dec 15 20:30:48 cnode2 fence_pcmk[2845]: Call to fence cnode1 (reset)
failed with rc=237
Dec 15 20:30:48 cnode2 fenced[2529]: fence cnode1 dev 0.0 agent fence_pcmk
result: error from agent
Dec 15 20:30:48 cnode2 fenced[2529]: fence cnode1 failed
Dec 15 20:30:51 cnode2 fence_pcmk[2869]: Requesting Pacemaker fence cnode1
(reset)
Dec 15 20:30:51 cnode2 stonith-ng[2774]: notice: Client
stonith_admin.cman.2870.1c9e3d98 wants to fence (reboot) 'cnode1' with
device '(any)'
Dec 15 20:30:51 cnode2 stonith-ng[2774]: notice: Initiating remote
operation reboot for cnode1: b2435128-3702-44a0-a42e-52b642278686 (0)
Dec 15 20:30:51 cnode2 stonith-ng[2774]: notice: Couldn't find anyone to
fence (reboot) cnode1 with any device
Dec 15 20:30:51 cnode2 stonith-ng[2774]: error: Operation reboot of
cnode1 by <no-one> for stonith_admin.cman.2870 at cnode2.b2435128: No such
device
================================================================
Please help to solve this problem.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20161216/6cba0bf1/attachment-0002.html>
More information about the Users
mailing list