[ClusterLabs] DRBD failover in Pacemaker

Devin Ortner Devin.Ortner at gtshq.onmicrosoft.com
Tue Sep 6 21:04:30 CEST 2016


I have a 2-node cluster running CentOS 6.8 and Pacemaker with DRBD. I have been using the "Clusters from Scratch" documentation to create my cluster and I am running into a problem where DRBD is not failing over to the other node when one goes down. Here is my "pcs status" prior to when it is supposed to fail over:

----------------------------------------------------------------------------------------------------------------------

[root at node1 ~]# pcs status
Cluster name: webcluster
Last updated: Tue Sep  6 14:50:21 2016		Last change: Tue Sep  6 14:50:17 2016 by root via crm_attribute on node1
Stack: cman
Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 5 resources configured

Online: [ node1 node2 ]

Full list of resources:

 Cluster_VIP	(ocf::heartbeat:IPaddr2):	Started node1
 Master/Slave Set: ClusterDBclone [ClusterDB]
     Masters: [ node1 ]
     Slaves: [ node2 ]
 ClusterFS	(ocf::heartbeat:Filesystem):	Started node1
 WebSite	(ocf::heartbeat:apache):	Started node1

Failed Actions:
* ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, exitreason='none',
    last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms


PCSD Status:
  node1: Online
  node2: Online

[root at node1 ~]#

When I put node1 in standby everything fails over except DRBD:
--------------------------------------------------------------------------------------

[root at node1 ~]# pcs cluster standby node1
[root at node1 ~]# pcs status
Cluster name: webcluster
Last updated: Tue Sep  6 14:53:45 2016		Last change: Tue Sep  6 14:53:37 2016 by root via cibadmin on node2
Stack: cman
Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 5 resources configured

Node node1: standby
Online: [ node2 ]

Full list of resources:

 Cluster_VIP	(ocf::heartbeat:IPaddr2):	Started node2
 Master/Slave Set: ClusterDBclone [ClusterDB]
     Slaves: [ node2 ]
     Stopped: [ node1 ]
 ClusterFS	(ocf::heartbeat:Filesystem):	Stopped
 WebSite	(ocf::heartbeat:apache):	Started node2

Failed Actions:
* ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, exitreason='none',
    last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms


PCSD Status:
  node1: Online
  node2: Online

[root at node1 ~]#

I have pasted the contents of "/var/log/messages" here: http://pastebin.com/0i0FMzGZ 
Here is my Configuration: http://pastebin.com/HqqBV90p 

When I unstandby node1, it comes back as the master for the DRBD and everything else stays running on node2 (Which is fine because I haven't setup colocation constraints for that)
Here is what I have after node1 is back: 
-----------------------------------------------------

[root at node1 ~]# pcs cluster unstandby node1
[root at node1 ~]# pcs status
Cluster name: webcluster
Last updated: Tue Sep  6 14:57:46 2016		Last change: Tue Sep  6 14:57:42 2016 by root via cibadmin on node1
Stack: cman
Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 5 resources configured

Online: [ node1 node2 ]

Full list of resources:

 Cluster_VIP	(ocf::heartbeat:IPaddr2):	Started node2
 Master/Slave Set: ClusterDBclone [ClusterDB]
     Masters: [ node1 ]
     Slaves: [ node2 ]
 ClusterFS	(ocf::heartbeat:Filesystem):	Started node1
 WebSite	(ocf::heartbeat:apache):	Started node2

Failed Actions:
* ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, exitreason='none',
    last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms


PCSD Status:
  node1: Online
  node2: Online

[root at node1 ~]#

Any help would be appreciated, I think there is something dumb that I'm missing.

Thank you.



More information about the Users mailing list