Thank for your reply Andreas,<br><br>My fisrt node is a virtual machine (active node), the second (passive node) is physical standalone server, <span id="result_box" class="short_text" lang="en"><span class="hps">there is</span> <span class="hps">no</span> <span class="hps">high load</span> <span class="hps">on any</span></span> of them <span id="result_box" class="" lang="en"><span class="hps">but the problem</span> <span class="hps">seems</span> <span class="hps">to come from the</span> <span class="hps">virtual server.<br>
</span></span><span id="result_box" class="" lang="en"><span class="hps">I actually have</span> <span class="hps">the same problem</span> <span class="hps">of</span> <span class="hps">split brain</span> <span class="hps">when I</span> <span class="hps">take  or delete a</span> <span class="hps">virtual machine snapshot</span><span class="hps"></span></span> (<span id="result_box" class="" lang="en"><span class="hps">network connection</span> is lost <span class="hps">for a few moment, maybe about 1s</span></span>). But<span id="result_box" class="" lang="en"><span class="hps"> i</span> <span class="hps">take </span><span class="hps">snapshot</span> only <span class="hps">once a week</span><span>,</span> <span class="hps">and I have</span> <span class="hps">split brain</span> <span class="hps">several times in a</span> <span class="hps">week</span></span>. <br>
<span id="result_box" class="short_text" lang="en"><span class="hps">I did</span>n't <span class="hps">detect any</span> <span class="hps">other loss of</span> <span class="hps">connection</span></span>,<span id="result_box" class="" lang="en"><span class="hps"> or perhaps</span> <span class="hps">it</span> <span class="hps">is</span> <span class="hps">micro</span> <span class="hps">network cuts</span> <span class="hps">that are not</span> <span class="hps">detected</span> <span class="hps">by my</span> <span class="hps">monitoring system</span></span><span id="result_box" class="" lang="en"><span class="hps"> (and </span></span><span id="result_box" class="short_text" lang="en"><span class="hps">I</span> <span class="hps">have no problem</span> <span class="hps">with</span> <span class="hps">my </span><span class="hps">nonclustered</span></span> services).<br>
In case of microcuts, i think the problem is DRBD, <span id="result_box" class="short_text" lang="en"><span class="hps">is it</span> <span class="hps">too sensitive</span><span class="">?</span></span> can i adjust values to avoid the problem?<br>
<br><br>I will try increase my token value to 10000 / consensus to 12000 and configure resource-level fencing in DRBD<span id="result_box" class="short_text" lang="en"><span class="hps"></span></span>, thanks for the tips.<br>
<span id="result_box" class="" lang="en"><span class="hps"></span></span><br>About redundant rings, I read on the DRBD documentation that it is vital for the resource level fencing, but can i do without?<br>Because i use a virtual server (my virtual servers are on a blade) i can't have "physical" link between the 2 nodes (cable between the 2 nodes), so i use "virtual links" (with vlan to separate them from my main network). I can create a 2nd corosync link but I doubt its usefulness, if something goes wrong with the first link, I think i would have the same problem on the second. Although they are virtually separated, they use the same physical hardware (All my hardware is redondant therefore link problems are very limited). <br>
But maybe i've wrong,<span id="result_box" class="short_text" lang="en"><span class="hps"> I</span><span class="">'ll think</span> <span class="hps">about it</span></span>.<br><br><br><span id="result_box" class="" lang="en"><span class="hps">About</span> <span class="hps">stonith</span><span class="">, I will read</span> <span class="hps">the</span> <span class="hps">documentation</span><span>,</span> <span class="hps">but is it</span> <span class="hps">really useful</span> <span class="hps">to get out</span> <span class="hps">the "big artillery"</span> <span class="hps">for a simple</span> <span class="hps">2-node</span> <span class="hps">cluster</span> <span class="hps">in</span> <span class="hps">active / passive</span><span> mode?</span> <span class="hps atn">(</span><span>I read that</span> <span class="hps">stonith</span> <span class="hps">is</span> most <span class="hps">used</span> <span class="hps">for</span> <span class="hps">active /</span> <span class="hps">active</span> <span class="hps">clusters</span><span class="">)</span></span>.<br>
<br><span id="result_box" class="" lang="en"><span class="hps">Anyway</span><span>, thank you</span> <span class="hps">for these advices, this</span> <span class="hps">is</span> <span class="hps">much appreciated</span></span>!<br>
<br><br><div class="gmail_quote">2012/6/26 Andreas Kurz <span dir="ltr"><<a href="mailto:andreas@hastexo.com" target="_blank">andreas@hastexo.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On 06/26/2012 03:49 PM, coma wrote:<br>
> Hello,<br>
><br>
> i running on a 2 node cluster with corosync & drbd in active/passive<br>
> mode for mysql hight availablity.<br>
><br>
> The cluster working fine (failover/failback & replication ok), i have no<br>
> network outage (network is monitored and i've not seen any failure) but<br>
> split-brain occurs very often and i don't anderstand why, maybe you can<br>
> help me?<br>
<br>
</div>Are the nodes virtual machines or have a high load from time to time?<br>
<div class="im"><br>
><br>
> I'm new pacemaker/corosync/DRBD user, so my cluster and drbd<br>
> configuration are probably not optimal, so if you have any comments,<br>
> tips or examples I would be very grateful!<br>
><br>
> Here is an exemple of corosync log when a split-brain occurs (1 hour log<br>
> to see before/after split-brain):<br>
><br>
> <a href="http://pastebin.com/3DprkcTA" target="_blank">http://pastebin.com/3DprkcTA</a><br>
<br>
</div>Increase your token value in corosync.conf to a higher value ... like<br>
10s, configure resource-level fencing in DRBD and setup STONITH for your<br>
cluster and use redundant corosync rings.<br>
<br>
Regards,<br>
Andreas<br>
<br>
--<br>
Need help with Pacemaker?<br>
<a href="http://www.hastexo.com/now" target="_blank">http://www.hastexo.com/now</a><br>
<div class="im"><br>
><br>
> Thank you in advance for any help!<br>
><br>
><br>
> More details about my configuration:<br>
><br>
> I have:<br>
> One prefered "master" node (node1) on a virtual server, and one "slave"<br>
> node on a physical server.<br>
> On each server,<br>
> eth0 is connected on my main LAN for client/server communication (with<br>
> cluster VIP)<br>
> Eth1 is connected on a dedicated Vlan for corosync communication<br>
> (network: 192.168.3.0 /30)<br>
> Eth2 is connected on a dedicated Vlan for drbd replication (network:<br>
</div>> <a href="http://192.168.2.0/30" target="_blank">192.168.2.0/30</a> <<a href="http://192.168.2.0/30" target="_blank">http://192.168.2.0/30</a>>)<br>
<div class="im">><br>
> Here is my drbd configuration:<br>
><br>
><br>
> resource drbd-mysql {<br>
> protocol C;<br>
>     disk {<br>
>         on-io-error detach;<br>
>     }<br>
>     handlers {<br>
>         fence-peer "/usr/lib/drbd/crm-fence-peer.sh";<br>
>         after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";<br>
>         split-brain "/usr/lib/drbd/notify-split-brain.sh root";<br>
>     }<br>
>     net {<br>
>         cram-hmac-alg sha1;<br>
>         shared-secret "secret";<br>
>         after-sb-0pri discard-younger-primary;<br>
>         after-sb-1pri discard-secondary;<br>
>         after-sb-2pri call-pri-lost-after-sb;<br>
>     }<br>
>     startup {<br>
>         wfc-timeout  1;<br>
>         degr-wfc-timeout 1;<br>
>     }<br>
>     on node1{<br>
>         device /dev/drbd1;<br>
</div>>         address <a href="http://192.168.2.1:7801" target="_blank">192.168.2.1:7801</a> <<a href="http://192.168.2.1:7801" target="_blank">http://192.168.2.1:7801</a>>;<br>
<div class="im">>         disk /dev/sdb;<br>
>         meta-disk internal;<br>
>     }<br>
>     on node2 {<br>
>     device /dev/drbd1;<br>
</div>>     address <a href="http://192.168.2.2:7801" target="_blank">192.168.2.2:7801</a> <<a href="http://192.168.2.2:7801" target="_blank">http://192.168.2.2:7801</a>>;<br>
<div><div class="h5">>     disk /dev/sdb;<br>
>     meta-disk internal;<br>
>     }<br>
> }<br>
><br>
><br>
> Here my cluster config:<br>
><br>
> node node1 \<br>
>         attributes standby="off"<br>
> node node2 \<br>
>         attributes standby="off"<br>
> primitive Cluster-VIP ocf:heartbeat:IPaddr2 \<br>
>         params ip="10.1.0.130" broadcast="10.1.7.255" nic="eth0"<br>
> cidr_netmask="21" iflabel="VIP1" \<br>
>         op monitor interval="10s" timeout="20s" \<br>
>         meta is-managed="true"<br>
> primitive cluster_status_page ocf:heartbeat:ClusterMon \<br>
>         params pidfile="/var/run/crm_mon.pid"<br>
> htmlfile="/var/www/html/cluster_status.html" \<br>
>         op monitor interval="4s" timeout="20s"<br>
> primitive datavg ocf:heartbeat:LVM \<br>
>         params volgrpname="datavg" exclusive="true" \<br>
>         op start interval="0" timeout="30" \<br>
>         op stop interval="0" timeout="30"<br>
> primitive drbd_mysql ocf:linbit:drbd \<br>
>         params drbd_resource="drbd-mysql" \<br>
>         op monitor interval="15s"<br>
> primitive fs_mysql ocf:heartbeat:Filesystem \<br>
>         params device="/dev/datavg/data" directory="/data" fstype="ext4"<br>
> primitive mail_alert ocf:heartbeat:MailTo \<br>
</div></div>>         params email="<a href="mailto:myemail@test.com">myemail@test.com</a> <mailto:<a href="mailto:myemail@test.com">myemail@test.com</a>>" \<br>
<div class="HOEnZb"><div class="h5">>         op monitor interval="10" timeout="10" depth="0"<br>
> primitive mysqld ocf:heartbeat:mysql \<br>
>         params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf"<br>
> datadir="/data/mysql/databases" user="mysql"<br>
> pid="/var/run/mysqld/mysqld.pid" socket="/var/lib/mysql/mysql.sock"<br>
> test_passwd="cluster_test" test_table="Cluster_Test.dbcheck"<br>
> test_user="cluster_test" \<br>
>         op start interval="0" timeout="120" \<br>
>         op stop interval="0" timeout="120" \<br>
>         op monitor interval="30s" timeout="30s" OCF_CHECK_LEVEL="1"<br>
> target-role="Started"<br>
> group mysql datavg fs_mysql Cluster-VIP mysqld cluster_status_page<br>
> mail_alert<br>
> ms ms_drbd_mysql drbd_mysql \<br>
>         meta master-max="1" master-node-max="1" clone-max="2"<br>
> clone-node-max="1" notify="true"<br>
> location mysql-preferred-node mysql inf: node1<br>
> colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master<br>
> order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start<br>
> property $id="cib-bootstrap-options" \<br>
>         dc-version="1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558" \<br>
>         cluster-infrastructure="openais" \<br>
>         expected-quorum-votes="2" \<br>
>         stonith-enabled="false" \<br>
>         no-quorum-policy="ignore" \<br>
>         last-lrm-refresh="1340701656"<br>
> rsc_defaults $id="rsc-options" \<br>
>         resource-stickiness="100" \<br>
>         migration-threshold="2" \<br>
>         failure-timeout="30s"<br>
><br>
><br>
</div></div><div class="HOEnZb"><div class="h5">> _______________________________________________<br>
> Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
><br>
> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
><br>
<br>
<br>
<br>
<br>
</div></div><br>_______________________________________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
<br></blockquote></div><br>