[ClusterLabs] Question about STONITH for VM HA cluster in shared hosts environment

Fri Jun 30 04:02:19 EDT 2017

On 06/29/2017 07:23 PM, Ken Gaillot wrote:
> On 06/29/2017 12:08 PM, Digimer wrote:
>> On 29/06/17 12:39 PM, Andrés Pozo Muñoz wrote:
>>> Hi all,
>>>
>>> I am a newbie to Pacemaker and I can't find the perfect solution for my
>>> problem (probably I'm missing something), maybe someone can give me some
>>> hint :)
>>>
>>> My scenario is the following: I want to make a HA cluster composed of 2
>>> virtual machines running on top of SHARED virtualization hosts. That is,
>>> I have a bunch of hosts running multiple VMs, and I would like to create
>>> an HA cluster with 2 VMs (Ubuntus running some app) running in
>>> (different) hosts.
>>>
>>> About the resources, I have no problem, I'll configure some VIP and some
>>> lsb services in a group.
>>>
>>> My concern is about the STONITH, I can't find the perfect config:
>>>     * If I disable STONITH I may have the split brain problem.
>>>     * If I enable STONITH with external/libvirt fence, I'll have a
>>> single point of failure  if the host with the Active VM dies, right?
>>>  (Imagine the host running that active VMs dies, the STONITH operation
>>> from the other VM will fail and the switch-over will not happen, right?)
>>>     * I can't use a 'hardware' (ILO/DRAC) fence, because the host is
>>> running a lot of VMs, not only the ones in HA cluster :( I can't reboot
>>> it because of some failure in our HA.

If for whatever reason, be it physically separated networks or
the unwillingness to share the credentials with you / respective
your VMs, you don't have access to the ILO/DRAC/powerswitch ...
or even worse you don't even have access to VM-management
for similar reasons going with SBD (Storage-Based Death)
might be something to consider under certain circumstances.

The usefulness of SBD stands and falls with the availability of
a reliable watchdog on each of the cluster-nodes. Although
it has storage in the name the really important part is a
watchdog - in fact under certain circumstances there wouldn't
even have to be (a) storage device(s).

Anyway given that the implementation of the virtual watchdog
in a VM is reliable enough - e.g. guarded by a hardware
watchdog on the host - SBD can be used inside virtual machines
as well.

Basic idea that makes the difference in your case is that you
don't have to have a positive feedback about a node being
fenced but that you have something reliable enough in place
so that if you don't have a reply within a certain time you
can be sure the other side has committed suicide and thus
has brought all resources down the hard way.

SBD can run on a watchdog + pacemaker-quorum on 3 and up nodes,
on a watchdog + cluster-membership + a disk on 2 and up nodes
(very recent snapshot of SBD required) or without relying on input
from the cluster on 3 disks.
Of course the disks can be virtual as well as long as they are shared -
and that is something you might rather get than access to
VM-Management or even fencing-devices that kill the hosts.

Regards,
Klaus

>>>
>>> Is there an optimal configuration for such scenario?
>>>
>>> I think I'd rather live with the split brain problem, but I just want to
>>> know if I missed any config option.
>>>
>>> Thanks in advance!
>>>
>>> Cheers,
>>> Andrés
>> You've realized why a production cluster on VMs is generally not
>> recommended. :)
>>
>> If the project is important enough to make HA, then management needs to
>> allocate the budget to get the proper hardware for the effort, I would
>> argue. If you want to keep the services in VMs, that's fine, get a pair
>> of nodes and make them an HA cluster to protect the VMs as the services
>> (we do this all the time).
>>
>> With that, then you pair IPMI and switched PDUs for complete coverage
>> (IPMI alone isn't enough, because if the host is destroyed, it will take
>> the IPMI BMC with it).
> To elaborate on this approach, the underlying hosts could be the cluster
> nodes, and the VMs could be resources. If you make all the VMs into
> resources, then you get HA for all of them. You can also run Pacemaker
> Remote in any of the VMs if you want to monitor resources running inside
> them (or move resources from one VM to another).
>
> Commenting on your original question, I'd point out that if pacemaker
> chooses to fence one of the underlying hosts, it's not responding
> normally, so any other VMs on it are likely toast anyway. You may
> already be familiar, but you can set a fencing topology so that
> pacemaker tries libvirt first, then kills the host only if that fails.
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org