[ClusterLabs] Node Fencing and STONITH in Amazon Web Services

Mon Aug 29 13:29:09 EDT 2016

No ideas/inputs on this? Appreciate anything you guys can throw me here… Thank you!

--

[ jR ]
  @: jason at eramsey.org<mailto:jason at eramsey.org>

  there is no path to greatness; greatness is the path

From: Jason Ramsey <jason at eramsey.org>
Reply-To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Date: Friday, August 26, 2016 at 4:32 PM
To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Subject: [ClusterLabs] Node Fencing and STONITH in Amazon Web Services

If you don’t mind, please allow me to walk through my architecture just a bit. I know that I am far from an expert on this stuff, but I feel like I have a firm grasp on how this all works conceptually. That said, I welcome your insights and advice on how to approach this problem—and any ready-made solutions to it you might have on hand. :)

We are deploying into two availability zones (AZs) in AWS. Our goal is to be able to absorb the loss of an entire AZ and continue to provide services to users. Our first problem comes with Microsoft SQL Server running on Windows Server Failover Clustering. As you guys likely know, WSFC isn’t polite about staying up without a quorum. As such, I figured, oh, hey, I can build a two-node Pacemaker-based iSCSI Target cluster and expose luns from it via iSCSI so that the WSFC nodes could have a witness filesystem.

So, I’ve managed to make that all happen. Yay! What I’m now trying to suss out is how I ensure that I’m covered for availability in the event of any kind of outage. As it turns out, I believe that I actually have most of the covered.

(I use “1o” to indicate “primary” and “2o” for “secondary”)

Planned Outage: effected node gracefully demoted from cluster, life goes on, everyone is happy
Unplanned Outage (NAS cluster node fails/unreachable): if 2o node failure, nothing happens; if 1o node failure, drbd promotes 2o to 1o, constrained vip, lvm, tgt, and lun resources automatically flip to 2o node, life goes on, everyone is happy

But still—the one thing we built this ridiculously complicated and overengineered thing for—I don’t feel like I have a good story when it comes to a severed AZ event (loss of perimeter communications, etc.)

Unplanned Outage (AZ connectivity severed): both nodes detect that the other node is gone so promote themselves to primary. The unsevered side would continue to work as expected, with the witness mounted by the SQL servers in that AZ, life goes on and at least the USERS are happy… but the severed side is also sojourning on. Both sides of the SQL cluster would think they have quorum even if they can’t talk to their peer nodes, so they mark their peers as down and keep on keeping on. No users would be connecting to the severed instances, but background and system tasks would proceed as normal, potentially writing new data to the databases making rejoining the nodes to the cluster a little bit tricky to say the least, especially if the severed side’s network comes back up and both systems come to realize that they’re not consistent.

So, my problem, I think, is two-fold:

1.      What can I monitor from each of the NAS cluster instances (besides connectivity to one another) that would “ALWAYS” be available when things are working and NEVER available when they are broken?  Seems to me that if I can sort out something that meets these criteria (I was thinking, perhaps, a RESTful connection to the AWS API, but I’m not entirely sure you can’t get responses at API endpoints that may or may not be hosted inside the AZ), then I could write a simple monitoring script that runs on both nodes that would act as a fencing and STONITH solution (if detect bad things, then shut down). Seems to me that this would prevent the data inconsistency since the severed side WSFC would lose its witness file system, thus its quorum, and take itself offline.

2.      Have I failed to account for another failure condition that could be potentially as/more harmful than anything I’ve thought of already?

Anyway, I’m hopeful, someone(s) here can share some of their own experiences from the trenches. Thank you for your time (and all the help you guys have been in getting this set up already).

--

[ jR ]

  @: jason at eramsey.org<mailto:jason at eramsey.org>

  there is no path to greatness; greatness is the path
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160829/bda1bdd8/attachment-0003.html>