[Pacemaker] Question: 2 node cluster in two datacenters - how to avoid split brain

Thu Nov 10 12:38:03 UTC 2011

After discussions with a teammate exists a second network connection to the other datacenter. so
my question is answered. thanks to all.

-----Ursprüngliche Nachricht-----
Von: Andreas Kurz [mailto:andreas at hastexo.com] 
Gesendet: Donnerstag, 10. November 2011 13:26
An: pacemaker at oss.clusterlabs.org
Betreff: Re: [Pacemaker] Question: 2 node cluster in two datacenters - how to avoid split brain

On 11/10/2011 10:21 AM, Senftleben, Stefan (itsc) wrote:
> Hello,
>  
> first of all I want to say hello all recipients of the pacemaker
> mailinglist!
>  
> I manage a two node active-passive cluster with an ms-drbd-resource and
> a depending resource.
> Each node is located in a separate datacenter, connected by a single
> dwdm-connection, a second network connection is not available.

Hmm ... what about the service network? Is that no option for a second
cluster communication path or is that all the same network?

> In case of a network interuption between the datacenters a split brain
> situation is created and a manual intervention of the admin is needed.
>  
> What are your recommendations for me to avoid the split brain situation?

Only using a manual fencing method like meatware, can't think of another
reliable fencing method between different sites as there are too many
cases where a stonith event is not appropriate only on the information
of one lost link.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

>  
>  
>  
> Thanks in advance!
>  
> Best regards,
>  
> Stefan
>  
>  
>  
> _Following the configuration:_
>  
> node lxds05 \
>         attributes standby="off"
> node lxds07 \
>         attributes standby="off"
> primitive apache2 ocf:heartbeat:apache \
>         params configfile="/etc/apache2/apache2.conf"
> httpd="/usr/sbin/apache2" \
>         operations $id="apache2-operations" \
>         op start interval="0" timeout="240" \
>         op stop interval="0" timeout="240" \
>         op monitor interval="15" timeout="20" start-delay="0" \
>         meta target-role="started" failure-timeout="60"
> primitive omd_site_main ocf:omd:omdnagios \
>         params site="main" \
>         op monitor interval="30s" timeout="30s" \
>         op start interval="0s" timeout="240s" \
>         op stop interval="0s" timeout="240s" \
>         meta target-role="Started" failure-timeout="60"
> primitive pri_drbd_omd ocf:linbit:drbd \
>         params drbd_resource="nagios" \
>         operations $id="drbd_disk-operations" \
>         op monitor interval="10" timeout="240" \
>         meta failure-timeout="60"
> primitive pri_fs_omd ocf:heartbeat:Filesystem \
>         params device="/dev/drbd0" directory="/opt/omd/" fstype="ext4" \
>         operations $id="pri_fs_omd-operations" \
>         op start interval="0" timeout="240" \
>         op stop interval="0" timeout="360" \
>         op monitor interval="15" timeout="20" start-delay="0" \
>         op notify interval="0" timeout="20" \
>         meta target-role="started" failure-timeout="60"
> primitive pri_nagiosIP ocf:heartbeat:IPaddr2 \
>         params ip="192.168.12.21" cidr_netmask="24" iflabel="NagiosIP" \
>         operations $id="pri_nagiosIP-operations" \
>         op start interval="0" timeout="20" \
>         op stop interval="0" timeout="20" \
>         op monitor interval="10" timeout="20" start-delay="0" \
>         meta target-role="started" failure-timeout="60"
> primitive res_MailTo_1 ocf:heartbeat:MailTo \
>         params email="stefan.senftleben at itsc.de" subject="nagios-group" \
>         operations $id="res_MailTo_1-operations" \
>         op start interval="0" timeout="10" \
>         op stop interval="0" timeout="10" \
>         op monitor interval="10" timeout="10" start-delay="0" \
>         meta failure-timeout="60"
> primitive res_MailTo_2 ocf:heartbeat:MailTo \
>         params email="stefan.senftleben at itsc.de" subject="omd_site_main" \
>         operations $id="res_MailTo_2-operations" \
>         op start interval="0" timeout="10" \
>         op stop interval="0" timeout="10" \
>         op monitor interval="10" timeout="10" start-delay="0" \
>         meta failure-timeout="60"
> primitive res_ping_1 ocf:pacemaker:ping \
>         params multiplier="1000" host_list="192.168.12.1 192.168.12.2
> 192.168.12.13" \
>         operations $id="res_ping_1-operations" \
>         op start interval="0" timeout="60" \
>         op stop interval="0" timeout="20" \
>         op monitor interval="10" timeout="60" start-delay="0" \
>         op reload interval="0" timeout="100"
> group nagios-group pri_fs_omd pri_nagiosIP apache2 \
>         meta target-role="started" failure-timeout="60"
> ms ms_drbd_omd pri_drbd_omd \
>         meta clone-max="2" notify="true"
> clone cl_ping_1 res_ping_1 \
>         meta clone-max="2" notify="true"
> location loc_drbdmaster_ping ms_drbd_omd \
>         rule $id="loc_drbdmaster_ping-rule" $role="Master" pingd:
> defined pingd
> colocation col_omd_follows_drbd inf: nagios-group ms_drbd_omd:Master
> colocation col_omd_site_main_nagios-group inf: omd_site_main nagios-group
> colocation col_res_MailTo_1_nagios-group inf: res_MailTo_1 nagios-group
> colocation col_res_MailTo_2_omd_site_main inf: res_MailTo_2 omd_site_main
> order ord_drbd_before_omd inf: ms_drbd_omd:promote nagios-group:start
> order ord_nagios-group_omd_site_main inf: nagios-group omd_site_main
> order ord_nagios-group_res_MailTo_1 inf: nagios-group res_MailTo_1
> order ord_omd_site_main_res_MailTo_2 inf: omd_site_main res_MailTo_2
> property $id="cib-bootstrap-options" \
>         expected-quorum-votes="2" \
>         stonith-enabled="false" \
>         dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
>         no-quorum-policy="ignore" \
>         cluster-infrastructure="openais" \
>         last-lrm-refresh="1320683212"
> rsc_defaults $id="rsc-options" \
>         resource-stickiness="100"
>  
>  
>  
>  
>  
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker