[Pacemaker] Split Site 2-way clusters
Vincent
vincent.cheche at gmail.com
Thu Apr 29 05:32:07 EDT 2010
Hi all,
I'm currently working out a similar concept as Miki. The only difference
is that my cluster would NOT be in active/active.
Here is a brief description of my scenario:
1. Three geographically distinct locations: A,B and X. There are no WAN
connections. We have direct multiplexed fibre connections (same subnet)
to those sites, so there are no issues with WAN timeouts and such, but
split brain is an issue to!
2. Two cluster members: server1 @ A and
server2 @ B
3. No shared storage like a SAN, but replicated data like
DRBD, MySQL etc.
4. Cluster works in active/passive mode. No
master/master, as this is to risky and has to many bottlenecks in case
of disaster recovery!
5. Location X would have one server not hosting
any ressources at any time, so this can only be some kind of quorum
server.
Miki, have you managed to get a working setup for your scenario? How
finally?
Here are my thoughts:
* iSCSI reservation on a server @ location X could
be an option, but I wonder how well this is working and if there are any
cases out there and maybe some caveats, similar to a shared storage,
only over IP? Are there any "out of the box" solutions for iSCSI
reservation: Pacemaker RA, etc.?
* How would a server @ location X as a third cluster member handle? This
server would have location constraints, prohibiting ressources to run on
it. His sole purpose would be to provide a vote, so the cluster would
have a quorum in case of split brain between location A and B. As I
already understood from a reply of Andrew, the quorum mechanism purely
relies on the number of nodes joinable in the cluster, right!?
* Considering the broken triangle scenario, as already mentioned by
Miki, and a third "quorum" cluster member @ location X, would all the
cluster members know from each other through some kind of relay of the
multicast messages, i.e. server @ A does multicast, server @ X receives
them and relays them to server @ B, correct!? If this is the case and my
understanding is correct, the cluster would continue working as if
"nothing happened"?
* In case of complete site isolation (1-1-1 situation) the cluster would
stop ressources, as I would set the quorum-policy to stop. Would the
cluster restart ressources, once it reaches quorum again? Using drbd,
would that correctly work when the slave would first get quorum and
start working again? Once the old master finally comes back again, would
the cluster return to a consistent state, especially the DRBD?
Best regards,
Vincent
More information about the Pacemaker
mailing list