[Pacemaker] 2 node cluster questions

Thu Dec 1 06:03:15 EST 2011

Hello Nick, Florian,

Also thanks to both of you for providing insights! Some extra thoughts/comments below...

>> 
>> For 2-node clusters there is no need to at all, really, as the STONITH
>> infrastructure in the Pacemaker stack is well suited for fencing
>> purposes, both at the storage and at the node level. If you want fencing
>> based on exclusive access to storage, akin to a quorum disk, then SBD is
>> the way to go. Most people prefer IPMI though: it's ubiquitous (it's
>> almost impossible to buy a server without an IPMI BMC these days), it
>> works for both shared-storage and shared-nothing clusters (unlike any
>> quorum disk style method, which is utterly useless in replicated-storage
>> configurations), and it's well integrated with Pacemaker.

Okay... I agree: with good fencing split-brain can be avoided. But you also want to avoid the deadmatch afterwards, right? In a 3-node (and more) setup, nodes can only fence if they have quorum. That's a problem for 2 node clusters... and to me, it seems like a quorum server is a real benefit here. You could prohibit a node from starting the cluster services (and thus prevent a deadmatch) at startup or use SBD on iSCSI (because it somehow resembles an external 3rd vote over the network). But I think a quorum server is the only real solution in the end... I mean: a node should not stay down; it should come up but 'behave' and not start fencing the other (due to the no-quorum-policy="ignore" setting which is needed if you don't have at least a 3rd quorum).

Looking forward to the quorum plug-in engine which is going to be included in Corosync! 

https://lists.linux-foundation.org/pipermail/openais/2011-August/016647.html 
http://www.google.be/url?sa=t&rct=j&q=The+Corosync+cluster+engine+-+reprint&source=web&cd=1&ved=0CCcQFjAA&url=http%3A%2F%2Fwww.linuxsymposium.org%2Farchives%2FOLS%2FReprints-2008%2Fdake-reprint.pdf&ei=BEfXTuzSO4aF-wb3j5ixDg&usg=AFQjCNG_D7AKeAc1Nh2ImZL-6_2xayb9qw&cad=rja

>> 
>> 
>> If there is really zero network connectivity between the two, then
>> they'll attempt to fence each other, but only one will win. Last man
>> standing gets/keeps the resources.
>> 
>> However, at that point your users won't care: you'll always run at least
>> one cluster communications link across the same connections your users
>> use to talk to your service. If all links (including that one) die, then
>> the service is unreachable anyhow. No matter what node you fence.

Very true... and an interesting way to look at it. That's why SBD on iSCSI (as a '3rd' quorum) is a bad idea. In case a switch module fails and STP takes a while (and influences all your VLANs), both nodes would lose the disk and fence each other. To avoid deadmatch, their services would not be started at boot and both nodes would stay down. Simplistically stated :)

Oh yeah, thanks Nick for that comment about cman. Interesting to read about all kinds of cluster stacks (http://theclusterguy.clusterlabs.org/post/907043024/introducing-the-pacemaker-master-control-process-for) and the fact that clvm on Red Hat requires cman while on SuSe it doesn't. It seems you can compile it with different options...

I'm looking forward to a unified stack without all the bits and pieces left from the past (which is confusing)! That'll make Linux clustering a killer :)

Cheers!