Hi there,<br><br>I have successfully configured a 2 node DRBD pacemaker cluster using the instructions provided by LINBIT here: <a href="http://www.drbd.org/users-guide-emb/ch-pacemaker.html">http://www.drbd.org/users-guide-emb/ch-pacemaker.html</a>. The cluster works perfectly and I can migrate the resources back and
forth between the two nodes without a problem. However, when simulating certain cluster communication failures, I am having problems preventing the DRBD cluster from entering a split brain state. I have been led to believe that STONITH will help prevent split brain situations, but the LINBIT instructions do not provide any guidance on how to conifgure STONITH in the pacemaker cluster. The only thing I can find in LINBITs documentation is where it talks about the resource fencing options within the /etc/drbd.conf
of which I have configured:<br><div class="im"><br>
<br>resource r0 {<br> disk {<br> fencing resource-only;<br> }<br> handlers {<br> fence-peer "/usr/lib/drbd/crm-fence-peer.sh";<br> after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";<br>
}<br><br>I'm still at a loss to understand what actually triggers DRBD to run the above fencing scripts or how to tell if it has run them.<br><br></div>I've
searched the internet high and low for example pacemaker configs that
show you how to configure STONITH resources for DRBD, but I can't find
anything useful.<br><br>Whilst hunting the Internet I did find this howto: ( <a href="http://www.howtoforge.com/installation-and-setup-guide-for-drbd-openais-pacemaker-xen-on-opensuse-11.1" target="_blank">http://www.howtoforge.com/installation-and-setup-guide-for-drbd-openais-pacemaker-xen-on-opensuse-11.1</a> ) that spells out how to configure a DRBD pacemaker cluster and even states the following: "STONITH is disabled
in this [example] configuration though it is highly-recommended in any production
environment to eliminate the risk of divergent data." Infuriatingly it doesn't tell you how to configure STONITH!<br><br>Could someone
you please, please, please give me some pointers or some helpful examples on how I go about configuring STONITH and or modifying my pacemaker configuration in any other ways to get it into a production ready state? My current configuration is listed below:<br>
<br>The cluster is built on 2 Redhat EL 5.3 servers running the following software versions:<br>drbd-8.3.6-1<br>pacemaker-1.0.5-4.1<br>openais-0.80.5-15.1<br><br><br>root@mq001:~# crm configure show<br>node mq001.back.live.cwwtf.local<br>
node mq002.back.live.cwwtf.local<br>primitive activemq-emp lsb:bbc-activemq-emp<br>primitive activemq-forge-services lsb:bbc-activemq-forge-<div id=":cj" class="ii gt">services<br>
primitive activemq-social lsb:activemq-social<br>primitive drbd_activemq ocf:linbit:drbd \<br> params drbd_resource="r0" \<br> op monitor interval="15s"<br>primitive fs_activemq ocf:heartbeat:Filesystem \<br>
params device="/dev/drbd1" directory="/drbd" fstype="ext3"<br>primitive ip_activemq ocf:heartbeat:IPaddr2 \<br> params ip="172.23.8.71" nic="eth0"<br>group activemq fs_activemq ip_activemq activemq-forge-services activemq-emp activemq-social<br>
ms ms_drbd_activemq drbd_activemq \<br> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"<br>colocation activemq_on_drbd inf: activemq ms_drbd_activemq:Master<br>
order activemq_after_drbd inf: ms_drbd_activemq:promote activemq:start<br>property $id="cib-bootstrap-options" \<br> dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \<br> cluster-infrastructure="openais" \<br>
expected-quorum-votes="2" \<br> no-quorum-policy="ignore" \<br> last-lrm-refresh="1260809203"<br><br>/etc/drbd.conf<br><br>global {<br> usage-count no;<br>}<br>common {<br> protocol C;<br>
}<br>resource r0 {<br> disk {<br> fencing resource-only;<br> }<br> handlers {<br> fence-peer "/usr/lib/drbd/crm-fence-peer.<div id=":cj" class="ii gt">sh";<br>
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";<br> }<br> syncer {<br> rate 40M;<br> }<br> on mq001.back.live.cwwtf.local {<br> device /dev/drbd1;<br> disk /dev/cciss/c0d0p1;<br>
address <a href="http://172.23.8.69:7789/" target="_blank">172.23.8.69:7789</a>;<br> meta-disk internal;<br> }<br> on mq002.back.live.cwwtf.local {<br> device /dev/drbd1;<br> disk /dev/cciss/c0d0p1;<br>
address <a href="http://172.23.8.70:7789/" target="_blank">172.23.8.70:7789</a>;<br>
meta-disk internal;<br> }<br>}<br><br><br>root@mq001:~# cat /etc/ais/openais.conf <br>totem {<br> version: 2<br> token: 3000<br> token_retransmits_before_loss_const: 10<br> join: 60<br> consensus: 1500<br> vsftype: none<br>
max_messages: 20<br> clear_node_high_bit: yes<br> secauth: on<br> threads: 0<br> rrp_mode: passive<br> interface {<br> ringnumber: 0<br> bindnetaddr: 172.59.60.0<br> mcastaddr: 239.94.1.1<br> mcastport: 5405<br>
}<br> interface {<br> ringnumber: 1<br> bindnetaddr: 172.23.8.0<br> mcastaddr: 239.94.2.1<br> mcastport: 5405<br> }<br>}<br>logging {<br> to_stderr: yes<br> debug: on<br> timestamp: on<br> to_file: no<br>
to_syslog: yes<br> syslog_facility: daemon<br>}<br>amf {<br> mode: disabled<br>}<br>service {<br> ver: 0<br> name: pacemaker<br> use_mgmtd: yes<br>}<br>aisexec {<br> user: root<br> group: root<br>}<br>
<br>Many Thanks,<br>Tom<br></div><br></div><br>