[ClusterLabs] fence-scsi question
hunter86_bg at yahoo.com
Mon Feb 10 00:06:07 EST 2020
On February 10, 2020 2:07:01 AM GMT+02:00, Dan Swartzendruber <dswartz at druber.com> wrote:
>I have a 2-node CentOS7 cluster running ZFS. The two nodes (vsphere
>appliances on different hosts) access 2 SAS SSD in a Supermicro JBOD
>with 2 mini-SAS connectors. It all works fine - failover and all. My
>quandary was how to implement fencing. I was able to get both of the
>vmware SOAP and REST fencing agents to work - it just isn't reliable
>enough. If the vcenter server appliance is busy, fencing requests
>timeout. I know I can increase the timeouts, but in at least one test
>run, even a minute wasn't enough, and my concern is that too long
>switching over, and vmware will put the datastore in APD, hosing
> I confirmed that both SSD work properly with the fence-scsi agent.
>Fencing the host who actively owns the ZFS pool also works perfectly
>(ZFS flushes data to the datastore every 5 seconds or so, so
>the SCSI-3 persistent reservations causes a fatal write error to the
>pool, and setting the pool in failmode=panic will cause the fenced
>cluster node to reboot automatically.) The problem (maybe it isn't
>really one?) is that fencing the node that does *not* own the pool has
>no effect, since it holds no reservations on the devices in the pool.)
>I'd love to be sure this isn't an issue at all.
>Manage your subscription:
>ClusterLabs home: https://www.clusterlabs.org/
You can configure multiple fencing mechanisms in your cluster.
For example, you can set the first fencing mechanism to be via VmWare and if it fails (being busy or currrently unavailable), then the scsi fencing can kick in to ensure a failover can be done.
What you observe is normal - no scsi reservations -> no fencing. That's why major vendors require , when using fence_multipath/fence_scsi, the shared storage to be a dependency (a File system in use by the application) and not just an add-on.
I personally don't like scsi reservations, as there is no guarantee that other resources (services, IPs, etc) are actually down , but the risk is low.
In your case fence_scsi stonith can be a second layer of protection.
More information about the Users