<br><br><div class="gmail_quote">On Wed, Jan 13, 2010 at 8:19 AM, Miki Shapiro <span dir="ltr"><<a href="mailto:Miki.Shapiro@coles.com.au">Miki.Shapiro@coles.com.au</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<div lang="EN-AU" link="blue" vlink="purple">


<div>


<p class="MsoNormal">Separate to my earlier post re CRM DC election in a 2-way cluster,

I’m chasing up the (separate) issue of making the cluster a CROSS-SITE

one. </p>


<p class="MsoNormal"> </p>


<p class="MsoNormal">As stated in yay other thread, I’m running a 2-way

quorum-agnostic cluster on a SLES11, openais, pacemaker, drbd (… clvm, ocfs2,

ctdb, nfs, etc) on HP Blades. </p>


<p class="MsoNormal"> </p>


<p class="MsoNormal">A few old threads (with a rather elaborate answer from Lars)

indicate that as of March 2009 split-site wasn’t yet thoroughly supported

as WAN connectivity issues were not thoroughly addressed, and that as of then

quorumd was not yet sufficiently robust/tested/PROD-ready. </p>


<p class="MsoNormal"> </p>


<p class="MsoNormal">What we decided we want to do is rely on an extremely simple

(and hopefully by inference predictable and reliable) arbitrator - a THIRD linux

server that lives at a SEPARATE THIRD site altogether with no special HA-related

daemons running on it. </p>


<p class="MsoNormal"> </p>


<p class="MsoNormal">I’ll build a STONITH ocf script, configure it as a

cloned STONITH resource running on both nodes, and it will do roughly this when

pinging the other node (via either one or two redundant links) will fail:</p>


<p class="MsoNormal"> </p>


<p class="MsoNormal">ssh arbitrator mkdir /tmp/$clustername && shoot-other-node

|| hard-suicide-NOW</p>


<p class="MsoNormal"> </p>


<p class="MsoNormal">Thus, when split, the nodes race to the arbitrator. </p>


<p class="MsoNormal">First to run the mkdir command on the arbitrator (and get rc=0)

wins, gets the long straw and lives.  Loser gets shot (either by its peer

if WAN allows peer to communicate with soon-to-be-dead node’s iLO or by said

node sealing its own fate). </p>


<p class="MsoNormal"> </p>


<p class="MsoNormal">Primary failure modes not accounted for by a run-of-the-mill

non-split-site cluster are thus: </p>


<p class="MsoNormal"> </p>


<p><span>1.<span style="font:7.0pt "Times New Roman"">      

</span></span>One node cut off – cutoff node will fail the race

and suicide. Good node will succeed and proceed to provide service.</p>


<p><span>2.<span style="font:7.0pt "Times New Roman"">      

</span></span>Nodes cut off from each other but can both access the

arbitrator – slower node will suicide. Faster node will succeed and

proceed to provide service.</p>


<p><span>3.<span style="font:7.0pt "Times New Roman"">      

</span></span>Both nodes are cut off, or the comms issue affects both

node1<->node2 comms AND all ->arbitrator comms (double failure). 

– both nodes suicide (and potentially leave me with two inconsistent and

potentially corrupt filesystems). Can’t see an easy way around this one

(can anyone?)</p></div></div></blockquote><div><br></div><div>Basically thats the part that the stuff we haven't written yet is supposed to address.</div><div><br></div><div>You want to avoid the "|| hard-suicide-NOW" part of your logic, but you can't safely do that unless there is some way to stop the services on the non-connected node(s) - preferably _really_ quickly.</div>

<div><br></div><div>What about setting no-quorum-policy to freeze and making the third node a full cluster member (that just doesn't run any resources)?</div><div>That way, if you get a 1-1-1 split the nodes will leave all services running where they were and while it waits for quorum.</div>

<div>And if it heals into a 1-2 split, then the majority will terminate the rogue node and acquire all the services.</div><div><br></div><div>The biggest problem is the reliability of your links and stonith devices - give particular thought to how you'd fence _node_ A if comms to _site_ A are down.... </div>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div lang="EN-AU" link="blue" vlink="purple"><div>


<p> </p>


<p class="MsoNormal"> </p>


<p class="MsoNormal">Looks to me like this can easily be implemented without any

fancy quorum servers (on top of the required little ocf script and the existence

of the arbitrator)</p>


<p class="MsoNormal"> </p>


<p class="MsoNormal">Does anyone have thoughts on this? Am I ignoring any major

issues, or reinventing the wheel, or should this this potentially work as I

think it will?</p>


<p class="MsoNormal"> </p>


<p class="MsoNormal">Thanks! <span style="font-family:Wingdings">J</span></p>


<p class="MsoNormal"> </p>


<p class="MsoNormal">And a little addendum which just occurred to me re transient

WAN network issues: </p>


<p> </p>


<p><span>1.<span style="font:7.0pt "Times New Roman"">      

</span></span>Transient big (>2min) network issues will land me

with a cluster that needs a human to turn on one node on every time they

happen. Bad. </p>


<p> </p>


<p>My proposed solution: classify a peer-failure as a WAN-problem

by pinging peer node’s core router when peer node appears dead, if router

dead too touch a WAN-problem-flagfile, and so long as the flag-file sits there the

survivor pings (done via ocf ping resource) other-side-router until it comes

online, then shooting a “check-power-status && O-GOD-IT-STILL-LIVES-KILL-IT-NOW

|| power-it-on” command to the peer’s iLO (and promptly delete the

flag). </p>


<p> </p>


<p>Implementation cost: a wee bit of scripting and a wee

bit of pacemaker configuration. </p>


<p> </p>


<p><span>2.<span style="font:7.0pt "Times New Roman"">      

</span></span>Transient small network issues will require stretching

pacemaker’s default timeouts sufficiently to avoid (or end up in the item

1 bucket above)</p>


<p class="MsoNormal"> </p>


<p class="MsoNormal">Am very keen to know what the gurus think <span style="font-family:Wingdings">J</span> <span style="font-family:Wingdings">J</span></p>


<p class="MsoNormal" style="text-autospace:none"><b><span style="color:#4F81BD"> </span></b></p>


<p class="MsoNormal" style="text-autospace:none"><b><span style="color:#4F81BD">Miki Shapiro</span></b></p>


<p class="MsoNormal" style="text-autospace:none"><span style="font-size:9.0pt;color:#4F81BD">Linux Systems Engineer<br>

Infrastructure Services & Operations</span></p>


<p class="MsoNormal" style="text-autospace:none"><span style="font-size:9.5pt;color:#4F81BD"><br>

</span><img width="75" height="23" src="cid:image001.png@01CA9473.B8C182C0"><img width="3" height="3" src="cid:image002.png@01CA9473.B8C182C0"><span style="font-size:12.0pt;font-family:"Times New Roman","serif";color:#4F81BD"></span></p>


<p class="MsoNormal"><span style="font-size:9.0pt;color:#4F81BD">745 Springvale Road<br>

Mulgrave 3170 Australia<br>

Email</span><span style="font-size:9.5pt;color:#4F81BD"> <a href="mailto:miki.shapiro@coles.com.au" target="_blank"><span style="color:blue">miki.shapiro@coles.com.au</span></a><br>

</span><span style="font-size:9.0pt;color:#4F81BD">Phone: 61 3 854 10520</span></p>


<p class="MsoNormal"><span style="font-size:9.0pt;color:#4F81BD">Fax:     61 3 854 10558<br>

<br>

</span><span style="color:#4F81BD"></span></p>


<p class="MsoNormal"> </p>


</div>


<br>

______________________________________________________________________<br>

This email and any attachments may contain privileged and confidential<br>

information and are intended for the named addressee only. If you have<br>

received this e-mail in error, please notify the sender and delete<br>

this e-mail immediately. Any confidentiality, privilege or copyright<br>

is not waived or lost because this e-mail has been sent to you in<br>

error. It is your responsibility to check this e-mail and any<br>

attachments for viruses.  No warranty is made that this material is<br>

free from computer virus or any other defect or error.  Any<br>

loss/damage incurred by using this material is not the sender's<br>

responsibility.  The sender's entire liability will be limited to<br>

resupplying the material.<br>

______________________________________________________________________<br>

</div>


<br>_______________________________________________<br>

Pacemaker mailing list<br>

<a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br></blockquote></div><br>