[ClusterLabs] DRBD failover in Pacemaker
Digimer
lists at alteeve.ca
Wed Sep 7 12:23:04 CEST 2016
> no-quorum-policy: ignore
> stonith-enabled: false
You must have fencing configured.
CentOS 6 uses pacemaker with the cman plugin. So setup cman
(cluster.conf) to use the fence_pcmk passthrough agent, then setup
proper stonith in pacemaker (and test that it works). Finally, tell DRBD
to use 'fencing resource-and-stonith;' and configure the
'crm-{un,}fence-peer.sh' {un,}fence handlers.
See if that gets things working.
On 07/09/16 04:04 AM, Devin Ortner wrote:
> I have a 2-node cluster running CentOS 6.8 and Pacemaker with DRBD. I have been using the "Clusters from Scratch" documentation to create my cluster and I am running into a problem where DRBD is not failing over to the other node when one goes down. Here is my "pcs status" prior to when it is supposed to fail over:
>
> ----------------------------------------------------------------------------------------------------------------------
>
> [root at node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep 6 14:50:21 2016 Last change: Tue Sep 6 14:50:17 2016 by root via crm_attribute on node1
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
> 2 nodes and 5 resources configured
>
> Online: [ node1 node2 ]
>
> Full list of resources:
>
> Cluster_VIP (ocf::heartbeat:IPaddr2): Started node1
> Master/Slave Set: ClusterDBclone [ClusterDB]
> Masters: [ node1 ]
> Slaves: [ node2 ]
> ClusterFS (ocf::heartbeat:Filesystem): Started node1
> WebSite (ocf::heartbeat:apache): Started node1
>
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, exitreason='none',
> last-rc-change='Tue Sep 6 13:15:00 2016', queued=0ms, exec=40ms
>
>
> PCSD Status:
> node1: Online
> node2: Online
>
> [root at node1 ~]#
>
> When I put node1 in standby everything fails over except DRBD:
> --------------------------------------------------------------------------------------
>
> [root at node1 ~]# pcs cluster standby node1
> [root at node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep 6 14:53:45 2016 Last change: Tue Sep 6 14:53:37 2016 by root via cibadmin on node2
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
> 2 nodes and 5 resources configured
>
> Node node1: standby
> Online: [ node2 ]
>
> Full list of resources:
>
> Cluster_VIP (ocf::heartbeat:IPaddr2): Started node2
> Master/Slave Set: ClusterDBclone [ClusterDB]
> Slaves: [ node2 ]
> Stopped: [ node1 ]
> ClusterFS (ocf::heartbeat:Filesystem): Stopped
> WebSite (ocf::heartbeat:apache): Started node2
>
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, exitreason='none',
> last-rc-change='Tue Sep 6 13:15:00 2016', queued=0ms, exec=40ms
>
>
> PCSD Status:
> node1: Online
> node2: Online
>
> [root at node1 ~]#
>
> I have pasted the contents of "/var/log/messages" here: http://pastebin.com/0i0FMzGZ
> Here is my Configuration: http://pastebin.com/HqqBV90p
>
> When I unstandby node1, it comes back as the master for the DRBD and everything else stays running on node2 (Which is fine because I haven't setup colocation constraints for that)
> Here is what I have after node1 is back:
> -----------------------------------------------------
>
> [root at node1 ~]# pcs cluster unstandby node1
> [root at node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep 6 14:57:46 2016 Last change: Tue Sep 6 14:57:42 2016 by root via cibadmin on node1
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
> 2 nodes and 5 resources configured
>
> Online: [ node1 node2 ]
>
> Full list of resources:
>
> Cluster_VIP (ocf::heartbeat:IPaddr2): Started node2
> Master/Slave Set: ClusterDBclone [ClusterDB]
> Masters: [ node1 ]
> Slaves: [ node2 ]
> ClusterFS (ocf::heartbeat:Filesystem): Started node1
> WebSite (ocf::heartbeat:apache): Started node2
>
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, exitreason='none',
> last-rc-change='Tue Sep 6 13:15:00 2016', queued=0ms, exec=40ms
>
>
> PCSD Status:
> node1: Online
> node2: Online
>
> [root at node1 ~]#
>
> Any help would be appreciated, I think there is something dumb that I'm missing.
>
> Thank you.
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
More information about the Users
mailing list