[ClusterLabs] Fencing Two-Node Cluster on Two ESXi Hosts

Thu Nov 5 11:31:32 EST 2015

On 05/11/15 11:04 AM, Ken Gaillot wrote:
> On 11/05/2015 02:43 AM, Gonçalo Lourenço wrote:
>> Greetings, everyone!
>>
>>
>> I'm having some trouble understanding how to properly setup fencing in my two-node cluster (Pacemaker + Corosync). I apologize beforehand if this exact question has been answered in the past, but I think the intricacies of my situation might be interesting enough to warrant yet another thread on this matter!
> 
> No apology needed! Fencing is both important and challenging for a
> beginner, and I'd much rather see a question about fencing every week
> than see someone disable it.

I love you.

>> My setup consists of two virtual machines (the nodes of the cluster) running on two separate VMWare ESXi servers: VM 1 (CentOS 7) is running on ESXi 1; VM 2 (another CentOS 7) is on ESXi 2. I have all resources except for fencing running as intended (DRBD, a virtual IP address, and a DHCP server). I have no access to any additional computing resources, both physical and virtual. Both nodes use one NIC for DRBD and Corosync (since it's a virtual environment, I thought this would be enough) and another one used exclusively for the DHCP server.
> 
> One NIC for DRBD+corosync is fine. You can configure DRBD not to use the
> full bandwidth, so corosync always has some breathing room.
> 
>> My idea for fencing this two-node cluster is the following:
>> . Setup one VMWare SOAP fencing agent on VM 1 that talks to ESXi 1. This agent would run exclusively on VM 1 and would only serve to fence VM 2;
>> . Another VMWare SOAP fencing agent on VM 2 that'll talk to ESXi 2. Yet again, this agent would run solely on VM 2 and would only fence VM 1.
>>
>> Basically, the idea is to have them fence one another through the ESXi host they're running on.
>> Is this the right way to go? If so, how should I configure the fencing resource? If not, what should I change?
>>
>> Thank you for your time.
>>
>>
>> Kind regards,
>> Gonçalo Lourenço
> 
> I'm not familiar enough with VMWare to address the specifics, but that
> general design is a common approach in a two-node cluster. It's a great
> first pass for fencing: if there's a problem in one VM, the other will
> fence it.
> 
> However what if the underlying host is not responsive? The other node
> will attempt to fence but get a timeout, and so the cluster will refuse
> to run any resources ("better safe than sorry").
> 
> The usual solution is to have two levels of fencing: the first as you
> suggested, then another for the underlying host in case that fails.
> 
> The underlying hosts probably have IPMI, so you could use that as a
> second level without needing any new hardware. If the underlying host OS
> is having trouble, the other node can contact the IPMI and power-kill it.
> 
> However if IPMI shares power with the host (i.e. on-board as opposed to
> a separate unit on a blade chassis), then you still have no recovery if
> power fails. The most common solution is to use an intelligent power
> switch, whether as the second level, or as a third level after IPMI. If
> that's not an option, VM+IPMI fencing will still cover most of your
> bases (especially if the physical hosts have redundant power supplies).
> 
> Be sure to use "two_node: 1" in corosync.conf (assuming you're using
> corosync 2). That will allow one node to keep quorum if the other is
> shut down.
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?