[ClusterLabs] Fencing Two-Node Cluster on Two ESXi Hosts
Ken Gaillot
kgaillot at redhat.com
Thu Nov 5 16:04:53 UTC 2015
On 11/05/2015 02:43 AM, Gonçalo Lourenço wrote:
> Greetings, everyone!
>
>
> I'm having some trouble understanding how to properly setup fencing in my two-node cluster (Pacemaker + Corosync). I apologize beforehand if this exact question has been answered in the past, but I think the intricacies of my situation might be interesting enough to warrant yet another thread on this matter!
No apology needed! Fencing is both important and challenging for a
beginner, and I'd much rather see a question about fencing every week
than see someone disable it.
> My setup consists of two virtual machines (the nodes of the cluster) running on two separate VMWare ESXi servers: VM 1 (CentOS 7) is running on ESXi 1; VM 2 (another CentOS 7) is on ESXi 2. I have all resources except for fencing running as intended (DRBD, a virtual IP address, and a DHCP server). I have no access to any additional computing resources, both physical and virtual. Both nodes use one NIC for DRBD and Corosync (since it's a virtual environment, I thought this would be enough) and another one used exclusively for the DHCP server.
One NIC for DRBD+corosync is fine. You can configure DRBD not to use the
full bandwidth, so corosync always has some breathing room.
> My idea for fencing this two-node cluster is the following:
> . Setup one VMWare SOAP fencing agent on VM 1 that talks to ESXi 1. This agent would run exclusively on VM 1 and would only serve to fence VM 2;
> . Another VMWare SOAP fencing agent on VM 2 that'll talk to ESXi 2. Yet again, this agent would run solely on VM 2 and would only fence VM 1.
>
> Basically, the idea is to have them fence one another through the ESXi host they're running on.
> Is this the right way to go? If so, how should I configure the fencing resource? If not, what should I change?
>
> Thank you for your time.
>
>
> Kind regards,
> Gonçalo Lourenço
I'm not familiar enough with VMWare to address the specifics, but that
general design is a common approach in a two-node cluster. It's a great
first pass for fencing: if there's a problem in one VM, the other will
fence it.
However what if the underlying host is not responsive? The other node
will attempt to fence but get a timeout, and so the cluster will refuse
to run any resources ("better safe than sorry").
The usual solution is to have two levels of fencing: the first as you
suggested, then another for the underlying host in case that fails.
The underlying hosts probably have IPMI, so you could use that as a
second level without needing any new hardware. If the underlying host OS
is having trouble, the other node can contact the IPMI and power-kill it.
However if IPMI shares power with the host (i.e. on-board as opposed to
a separate unit on a blade chassis), then you still have no recovery if
power fails. The most common solution is to use an intelligent power
switch, whether as the second level, or as a third level after IPMI. If
that's not an option, VM+IPMI fencing will still cover most of your
bases (especially if the physical hosts have redundant power supplies).
Be sure to use "two_node: 1" in corosync.conf (assuming you're using
corosync 2). That will allow one node to keep quorum if the other is
shut down.
More information about the Users
mailing list