[ClusterLabs] DRBD failover in Pacemaker

Wed Sep 7 06:23:04 EDT 2016

> no-quorum-policy: ignore
> stonith-enabled: false

You must have fencing configured.

CentOS 6 uses pacemaker with the cman plugin. So setup cman
(cluster.conf) to use the fence_pcmk passthrough agent, then setup
proper stonith in pacemaker (and test that it works). Finally, tell DRBD
to use 'fencing resource-and-stonith;' and configure the
'crm-{un,}fence-peer.sh' {un,}fence handlers.

See if that gets things working.

On 07/09/16 04:04 AM, Devin Ortner wrote:
> I have a 2-node cluster running CentOS 6.8 and Pacemaker with DRBD. I have been using the "Clusters from Scratch" documentation to create my cluster and I am running into a problem where DRBD is not failing over to the other node when one goes down. Here is my "pcs status" prior to when it is supposed to fail over:
> 
> ----------------------------------------------------------------------------------------------------------------------
> 
> [root at node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep  6 14:50:21 2016		Last change: Tue Sep  6 14:50:17 2016 by root via crm_attribute on node1
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
> 2 nodes and 5 resources configured
> 
> Online: [ node1 node2 ]
> 
> Full list of resources:
> 
>  Cluster_VIP	(ocf::heartbeat:IPaddr2):	Started node1
>  Master/Slave Set: ClusterDBclone [ClusterDB]
>      Masters: [ node1 ]
>      Slaves: [ node2 ]
>  ClusterFS	(ocf::heartbeat:Filesystem):	Started node1
>  WebSite	(ocf::heartbeat:apache):	Started node1
> 
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, exitreason='none',
>     last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms
> 
> 
> PCSD Status:
>   node1: Online
>   node2: Online
> 
> [root at node1 ~]#
> 
> When I put node1 in standby everything fails over except DRBD:
> --------------------------------------------------------------------------------------
> 
> [root at node1 ~]# pcs cluster standby node1
> [root at node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep  6 14:53:45 2016		Last change: Tue Sep  6 14:53:37 2016 by root via cibadmin on node2
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
> 2 nodes and 5 resources configured
> 
> Node node1: standby
> Online: [ node2 ]
> 
> Full list of resources:
> 
>  Cluster_VIP	(ocf::heartbeat:IPaddr2):	Started node2
>  Master/Slave Set: ClusterDBclone [ClusterDB]
>      Slaves: [ node2 ]
>      Stopped: [ node1 ]
>  ClusterFS	(ocf::heartbeat:Filesystem):	Stopped
>  WebSite	(ocf::heartbeat:apache):	Started node2
> 
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, exitreason='none',
>     last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms
> 
> 
> PCSD Status:
>   node1: Online
>   node2: Online
> 
> [root at node1 ~]#
> 
> I have pasted the contents of "/var/log/messages" here: http://pastebin.com/0i0FMzGZ 
> Here is my Configuration: http://pastebin.com/HqqBV90p 
> 
> When I unstandby node1, it comes back as the master for the DRBD and everything else stays running on node2 (Which is fine because I haven't setup colocation constraints for that)
> Here is what I have after node1 is back: 
> -----------------------------------------------------
> 
> [root at node1 ~]# pcs cluster unstandby node1
> [root at node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep  6 14:57:46 2016		Last change: Tue Sep  6 14:57:42 2016 by root via cibadmin on node1
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
> 2 nodes and 5 resources configured
> 
> Online: [ node1 node2 ]
> 
> Full list of resources:
> 
>  Cluster_VIP	(ocf::heartbeat:IPaddr2):	Started node2
>  Master/Slave Set: ClusterDBclone [ClusterDB]
>      Masters: [ node1 ]
>      Slaves: [ node2 ]
>  ClusterFS	(ocf::heartbeat:Filesystem):	Started node1
>  WebSite	(ocf::heartbeat:apache):	Started node2
> 
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, exitreason='none',
>     last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms
> 
> 
> PCSD Status:
>   node1: Online
>   node2: Online
> 
> [root at node1 ~]#
> 
> Any help would be appreciated, I think there is something dumb that I'm missing.
> 
> Thank you.
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?