[ClusterLabs] is SFEX valid for Pacemaker on VMware with fence_vmware_soap?

Satoshi Suzuki tight2loop at gmail.com
Thu Sep 13 22:38:59 EDT 2018

pls let me ask if SFEX is valid as the disk exclusive access control for
Pacemaker clusters on VMware environment.

My client is planning to configure Pacemaker HA clusters on several
VMware vSphere 6.5 hosts.
Each of the HA clusters consists of two VM nodes of active and standby
across two different ESX hosts, with shared LVM disk resources.

As for the disk exclusive control and fencing mechanism with Pacemaker,
our IT vendor is proposing to use SFEX (Shared Disk File EXclusiveness)
and fence_vmware_soap (to reset the failing node via vCenter).

Here, I am very concerned about a case of an ESX host hanging for over a
minute like due to intermittent HW failures, so fence_vmware_soap would
not work. Forcing the standby node to takeover the disk resources with
SFEX, but if the hanging node comes back eventually, the hanged I/Os
that were queued on the last active node just before the ESX hanged-up
would flood over and corrupt the SFEX-takenover disk resources, because
there was no SCSI persistent reservation and no valid HW watchdog timer
for VMs on VMware.

So I think SFEX is valid only if combined with STONITH IPMI for
baremetal servers or even VMware hosts,
and we should use fence_scsi for the recent SPC-3 compliant disk storage
with fence_vmware_soap on VMware.  Am I right?

In addition, is fence_scsi with fence_vmware_soap proven enough in
production environments on RHEL7x on VMware?

Thank you for any responses.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180914/07d7a2d5/attachment.html>

More information about the Users mailing list