[ClusterLabs] Using fence_scsi agent and watchdog
liuk001 at gmail.com
Mon Aug 21 12:12:24 EDT 2017
I've setup a 2 Nodes PCS lab to test the fence_scsi agent and how it works.
The lab is comprised by the following VMs, all CentOS 7.3 under VMware
pcs1 - 192.168.199.101
pcs2 - 192.168.199.102
iscsi - 192.168.199.200 ISCSI Server
The ISCSI server is providing 3 Block Volumes like these to both PCS nodes:
/dev/sdb 200 MB fence volume with working SCSI-3 persistent reservation
/dev/sdc 1GB data volume XFS
/dev/sdd 2GB data volume XFS
The Fencing agent is configured like this:
pcs stonith create FenceSCSI fence_scsi pcmk_host_list="pcs1 pcs2"
devices=/dev/sdb meta provides=unfencing
Then I've created 2 ResGroups, each with an LVM Volume mounted under
/cluster/fs1 and /cluster/fs2.
PCS is working like expected in managing resources.
Coming to the fence_scsi it seems that to be sure to have one node fenced
the only solution is to install the watchdog rpm and to link the correct
/usr/share/cluster/fence_scsi_check file in the /etc/watchdog.d directory.
But I've noticed that there is a significant lag between the effective
reboot of the node and the resource takeover on the surviving node which
could lead to a dangerous situation, for example:
1. stonith_admin -F pcs1
2. PCS will stop on pcs1 and resource are switched on node pcs2 in a few
3. watchdog in some more time will trigger the reboot of the pcs1 node.
I've the following questions:
A. Is this the only possible configuration in order to use the fence_scsi
agent to be sure that fenced node is rebooted? If yes I think that
documentation should be updated accordingly because it is not very clear
B. is there a way to make the surviving node to wait that the fenced node
is actually rebooted before taking over the resources from the fenced node?
Thanks in advance for any answers.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Users