[Pacemaker] 2 node cluster questions

Fri Nov 25 11:09:49 EST 2011

Hi Dirk,

On 11/25/11 13:05, Hellemans Dirk D wrote:
> Hello everyone,
> 
>  
> 
> I’ve been reading a lot lately about using Corosync/Openais in
> combination with Pacemaker: SuSe Linux documentation, Pacemaker &
> Linux-ha website, interesting blogs, mailinglists, etc. As I’m
> particularly interested in how well two node clusters (located within
> the same server room) are handled, I was a bit confused by the fact that
> quorum disks/ quorum servers are (not yet?) supported/used. Some
> suggested to add a third node which is not actively participating (e.g.
> only running corosync.... or with hearbeat but in standby mode). That
> might be a solution but doesn’t “feel” right, especially if you consider
> multiple two-node clusters... that would require a lot of extra “quorum
> only nodes”. Somehow SBD (storage based death) in combination with a
> hardware watchdog timer seemed to also provide a solution: run it on top
> of iSCSI storage and you end up with a fencing device and some sort of
> “network based quorum” as tiebreaker. If one node loses network
> connectivity, sbd + watchdog will make sure it’s being fenced.
> 
>  
> 
> I’d love to hear your ideas about 2 node cluster setups. What is the
> best way to do it? Any chance we’ll get quorum disks/ quorum servers in
> the (near) future?

For 2-node clusters there is no need to at all, really, as the STONITH
infrastructure in the Pacemaker stack is well suited for fencing
purposes, both at the storage and at the node level. If you want fencing
based on exclusive access to storage, akin to a quorum disk, then SBD is
the way to go. Most people prefer IPMI though: it's ubiquitous (it's
almost impossible to buy a server without an IPMI BMC these days), it
works for both shared-storage and shared-nothing clusters (unlike any
quorum disk style method, which is utterly useless in replicated-storage
configurations), and it's well integrated with Pacemaker.

> In addition, say you’re not using sbd but an IPMI based fencing
> solution. You lose network connectivity on one of the nodes (I know,
> they’re redundant but still...sh*t happens ;) Does Pacemaker know which
> of both nodes lost network connectivity? E.g.: node 1 runs Oracle
> database, node 2 nothing. Node 2 loses network connectivity (e.g. both
> NICs without signal because unplugged by an errant technician ;) )... =>
> split brain situation occurs, but who’ll be fenced?

If there is really zero network connectivity between the two, then
they'll attempt to fence each other, but only one will win. Last man
standing gets/keeps the resources.

However, at that point your users won't care: you'll always run at least
one cluster communications link across the same connections your users
use to talk to your service. If all links (including that one) die, then
the service is unreachable anyhow. No matter what node you fence.

> The one with Oracle
> running ?? I really hope not... cause in this case, the cluster can
> “see” there’s no signal on the NICs of node2. Would be interesting to
> know more about how Pacemaker/corosync makes such kind of decisions...
> how to choose which one will be fenced in case of split brain. Is it
> randomly chosen?

No.

> Is it the DC which decides?

Simplistically speaking, yes. For a 2-node cluster the truth is a wee
bit more complicated, but for practical purposes let's just stick to
that assumption.

> Based on NIC state?

Not by doing any kind of explicit monitoring on the link state. But if
the other node no longer responds when its timeout expires it's
considered dead and scheduled for fencing.

Moving around services based on network connectivity _outside_ the
cluster is completely separate from fencing, mind you. That's called
network connectivity monitoring, and the Pacemaker stack supports it
through a combination of the ocf:pacemaker:ping RA and location constraints.

Hope this helps.

Cheers,
Florian

-- 
Need help with Pacemaker?
http://www.hastexo.com/knowledge/pacemaker