[ClusterLabs] DRBD failover in Pacemaker
Devin Ortner
Devin.Ortner at gtshq.onmicrosoft.com
Tue Sep 6 21:04:30 CEST 2016
I have a 2-node cluster running CentOS 6.8 and Pacemaker with DRBD. I have been using the "Clusters from Scratch" documentation to create my cluster and I am running into a problem where DRBD is not failing over to the other node when one goes down. Here is my "pcs status" prior to when it is supposed to fail over:
----------------------------------------------------------------------------------------------------------------------
[root at node1 ~]# pcs status
Cluster name: webcluster
Last updated: Tue Sep 6 14:50:21 2016 Last change: Tue Sep 6 14:50:17 2016 by root via crm_attribute on node1
Stack: cman
Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 5 resources configured
Online: [ node1 node2 ]
Full list of resources:
Cluster_VIP (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: ClusterDBclone [ClusterDB]
Masters: [ node1 ]
Slaves: [ node2 ]
ClusterFS (ocf::heartbeat:Filesystem): Started node1
WebSite (ocf::heartbeat:apache): Started node1
Failed Actions:
* ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, exitreason='none',
last-rc-change='Tue Sep 6 13:15:00 2016', queued=0ms, exec=40ms
PCSD Status:
node1: Online
node2: Online
[root at node1 ~]#
When I put node1 in standby everything fails over except DRBD:
--------------------------------------------------------------------------------------
[root at node1 ~]# pcs cluster standby node1
[root at node1 ~]# pcs status
Cluster name: webcluster
Last updated: Tue Sep 6 14:53:45 2016 Last change: Tue Sep 6 14:53:37 2016 by root via cibadmin on node2
Stack: cman
Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 5 resources configured
Node node1: standby
Online: [ node2 ]
Full list of resources:
Cluster_VIP (ocf::heartbeat:IPaddr2): Started node2
Master/Slave Set: ClusterDBclone [ClusterDB]
Slaves: [ node2 ]
Stopped: [ node1 ]
ClusterFS (ocf::heartbeat:Filesystem): Stopped
WebSite (ocf::heartbeat:apache): Started node2
Failed Actions:
* ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, exitreason='none',
last-rc-change='Tue Sep 6 13:15:00 2016', queued=0ms, exec=40ms
PCSD Status:
node1: Online
node2: Online
[root at node1 ~]#
I have pasted the contents of "/var/log/messages" here: http://pastebin.com/0i0FMzGZ
Here is my Configuration: http://pastebin.com/HqqBV90p
When I unstandby node1, it comes back as the master for the DRBD and everything else stays running on node2 (Which is fine because I haven't setup colocation constraints for that)
Here is what I have after node1 is back:
-----------------------------------------------------
[root at node1 ~]# pcs cluster unstandby node1
[root at node1 ~]# pcs status
Cluster name: webcluster
Last updated: Tue Sep 6 14:57:46 2016 Last change: Tue Sep 6 14:57:42 2016 by root via cibadmin on node1
Stack: cman
Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 5 resources configured
Online: [ node1 node2 ]
Full list of resources:
Cluster_VIP (ocf::heartbeat:IPaddr2): Started node2
Master/Slave Set: ClusterDBclone [ClusterDB]
Masters: [ node1 ]
Slaves: [ node2 ]
ClusterFS (ocf::heartbeat:Filesystem): Started node1
WebSite (ocf::heartbeat:apache): Started node2
Failed Actions:
* ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, exitreason='none',
last-rc-change='Tue Sep 6 13:15:00 2016', queued=0ms, exec=40ms
PCSD Status:
node1: Online
node2: Online
[root at node1 ~]#
Any help would be appreciated, I think there is something dumb that I'm missing.
Thank you.
More information about the Users
mailing list