[Pacemaker] Question: 2 node cluster in two datacenters - how to avoid split brain
Senftleben, Stefan (itsc)
Stefan.Senftleben at ITSC.de
Thu Nov 10 12:38:03 UTC 2011
After discussions with a teammate exists a second network connection to the other datacenter. so
my question is answered. thanks to all.
-----Ursprüngliche Nachricht-----
Von: Andreas Kurz [mailto:andreas at hastexo.com]
Gesendet: Donnerstag, 10. November 2011 13:26
An: pacemaker at oss.clusterlabs.org
Betreff: Re: [Pacemaker] Question: 2 node cluster in two datacenters - how to avoid split brain
On 11/10/2011 10:21 AM, Senftleben, Stefan (itsc) wrote:
> Hello,
>
> first of all I want to say hello all recipients of the pacemaker
> mailinglist!
>
> I manage a two node active-passive cluster with an ms-drbd-resource and
> a depending resource.
> Each node is located in a separate datacenter, connected by a single
> dwdm-connection, a second network connection is not available.
Hmm ... what about the service network? Is that no option for a second
cluster communication path or is that all the same network?
> In case of a network interuption between the datacenters a split brain
> situation is created and a manual intervention of the admin is needed.
>
> What are your recommendations for me to avoid the split brain situation?
Only using a manual fencing method like meatware, can't think of another
reliable fencing method between different sites as there are too many
cases where a stonith event is not appropriate only on the information
of one lost link.
Regards,
Andreas
--
Need help with Pacemaker?
http://www.hastexo.com/now
>
>
>
> Thanks in advance!
>
> Best regards,
>
> Stefan
>
>
>
> _Following the configuration:_
>
> node lxds05 \
> attributes standby="off"
> node lxds07 \
> attributes standby="off"
> primitive apache2 ocf:heartbeat:apache \
> params configfile="/etc/apache2/apache2.conf"
> httpd="/usr/sbin/apache2" \
> operations $id="apache2-operations" \
> op start interval="0" timeout="240" \
> op stop interval="0" timeout="240" \
> op monitor interval="15" timeout="20" start-delay="0" \
> meta target-role="started" failure-timeout="60"
> primitive omd_site_main ocf:omd:omdnagios \
> params site="main" \
> op monitor interval="30s" timeout="30s" \
> op start interval="0s" timeout="240s" \
> op stop interval="0s" timeout="240s" \
> meta target-role="Started" failure-timeout="60"
> primitive pri_drbd_omd ocf:linbit:drbd \
> params drbd_resource="nagios" \
> operations $id="drbd_disk-operations" \
> op monitor interval="10" timeout="240" \
> meta failure-timeout="60"
> primitive pri_fs_omd ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/opt/omd/" fstype="ext4" \
> operations $id="pri_fs_omd-operations" \
> op start interval="0" timeout="240" \
> op stop interval="0" timeout="360" \
> op monitor interval="15" timeout="20" start-delay="0" \
> op notify interval="0" timeout="20" \
> meta target-role="started" failure-timeout="60"
> primitive pri_nagiosIP ocf:heartbeat:IPaddr2 \
> params ip="192.168.12.21" cidr_netmask="24" iflabel="NagiosIP" \
> operations $id="pri_nagiosIP-operations" \
> op start interval="0" timeout="20" \
> op stop interval="0" timeout="20" \
> op monitor interval="10" timeout="20" start-delay="0" \
> meta target-role="started" failure-timeout="60"
> primitive res_MailTo_1 ocf:heartbeat:MailTo \
> params email="stefan.senftleben at itsc.de" subject="nagios-group" \
> operations $id="res_MailTo_1-operations" \
> op start interval="0" timeout="10" \
> op stop interval="0" timeout="10" \
> op monitor interval="10" timeout="10" start-delay="0" \
> meta failure-timeout="60"
> primitive res_MailTo_2 ocf:heartbeat:MailTo \
> params email="stefan.senftleben at itsc.de" subject="omd_site_main" \
> operations $id="res_MailTo_2-operations" \
> op start interval="0" timeout="10" \
> op stop interval="0" timeout="10" \
> op monitor interval="10" timeout="10" start-delay="0" \
> meta failure-timeout="60"
> primitive res_ping_1 ocf:pacemaker:ping \
> params multiplier="1000" host_list="192.168.12.1 192.168.12.2
> 192.168.12.13" \
> operations $id="res_ping_1-operations" \
> op start interval="0" timeout="60" \
> op stop interval="0" timeout="20" \
> op monitor interval="10" timeout="60" start-delay="0" \
> op reload interval="0" timeout="100"
> group nagios-group pri_fs_omd pri_nagiosIP apache2 \
> meta target-role="started" failure-timeout="60"
> ms ms_drbd_omd pri_drbd_omd \
> meta clone-max="2" notify="true"
> clone cl_ping_1 res_ping_1 \
> meta clone-max="2" notify="true"
> location loc_drbdmaster_ping ms_drbd_omd \
> rule $id="loc_drbdmaster_ping-rule" $role="Master" pingd:
> defined pingd
> colocation col_omd_follows_drbd inf: nagios-group ms_drbd_omd:Master
> colocation col_omd_site_main_nagios-group inf: omd_site_main nagios-group
> colocation col_res_MailTo_1_nagios-group inf: res_MailTo_1 nagios-group
> colocation col_res_MailTo_2_omd_site_main inf: res_MailTo_2 omd_site_main
> order ord_drbd_before_omd inf: ms_drbd_omd:promote nagios-group:start
> order ord_nagios-group_omd_site_main inf: nagios-group omd_site_main
> order ord_nagios-group_res_MailTo_1 inf: nagios-group res_MailTo_1
> order ord_omd_site_main_res_MailTo_2 inf: omd_site_main res_MailTo_2
> property $id="cib-bootstrap-options" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
> no-quorum-policy="ignore" \
> cluster-infrastructure="openais" \
> last-lrm-refresh="1320683212"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100"
>
>
>
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
More information about the Pacemaker
mailing list