[Pacemaker] Split Site 2-way clusters

Thu Apr 29 05:32:07 EDT 2010

Hi all, 

I'm currently working out a similar concept as Miki. The only difference 
is that my cluster would NOT be in active/active. 

Here is a brief description of my scenario: 

1. Three geographically distinct locations: A,B and X. There are no WAN 
connections. We have direct multiplexed fibre connections (same subnet) 
to those sites, so there are no issues with WAN timeouts and such, but 
split brain is an issue to! 
2. Two cluster members: server1 @ A and 
server2 @ B 
3. No shared storage like a SAN, but replicated data like 
DRBD, MySQL etc. 
4. Cluster works in active/passive mode. No 
master/master, as this is to risky and has to many bottlenecks in case 
of disaster recovery! 
5. Location X would have one server not hosting 
any ressources at any time, so this can only be some kind of quorum 
server. 

Miki, have you managed to get a working setup for your scenario? How 
finally? 

Here are my thoughts: 
* iSCSI reservation on a server @ location X could 
be an option, but I wonder how well this is working and if there are any 
cases out there and maybe some caveats, similar to a shared storage, 
only over IP? Are there any "out of the box" solutions for iSCSI 
reservation: Pacemaker RA, etc.? 

* How would a server @ location X as a third cluster member handle? This 
server would have location constraints, prohibiting ressources to run on 
it. His sole purpose would be to provide a vote, so the cluster would 
have a quorum in case of split brain between location A and B. As I 
already understood from a reply of Andrew, the quorum mechanism purely 
relies on the number of nodes joinable in the cluster, right!? 

* Considering the broken triangle scenario, as already mentioned by 
Miki, and a third "quorum" cluster member @ location X, would all the 
cluster members know from each other through some kind of relay of the 
multicast messages, i.e. server @ A does multicast, server @ X receives 
them and relays them to server @ B, correct!? If this is the case and my 
understanding is correct, the cluster would continue working as if 
"nothing happened"? 

* In case of complete site isolation (1-1-1 situation) the cluster would 
stop ressources, as I would set the quorum-policy to stop. Would the 
cluster restart ressources, once it reaches quorum again? Using drbd, 
would that correctly work when the slave would first get quorum and 
start working again? Once the old master finally comes back again, would 
the cluster return to a consistent state, especially the DRBD? 

Best regards, 
Vincent