[ClusterLabs] fence_apc delay?

Marek Grac mgrac at redhat.com
Sat Sep 3 12:41:59 UTC 2016


Hi,

There are two problems mentioned in the email.

1) power-wait

Power-wait is a quite advanced option and there are only few fence
devices/agent where it makes sense. And only because the HW/firmware on the
device is somewhat broken. Basically, when we execute power ON/OFF
operation, we wait for power-wait seconds before we send next command. I
don't remember any issue with APC and this kind of problems.


2) the only theory I could come up with was that maybe the fencing
operation was considered complete too quickly?

That is virtually not possible. Even when power ON/OFF is asynchronous, we
test status of device and fence agent wait until status of the plug/VM/...
matches what user wants.


m,


On Fri, Sep 2, 2016 at 3:14 PM, Dan Swartzendruber <dswartz at druber.com>
wrote:

>
> So, I was testing my ZFS dual-head JBOD 2-node cluster.  Manual failovers
> worked just fine.  I then went to try an acid-test by logging in to node A
> and doing 'systemctl stop network'.  Sure enough, pacemaker told the APC
> fencing agent to power-cycle node A.  The ZFS pool moved to node B as
> expected.  As soon as node A was back up, I migrated the pool/IP back to
> node A.  I *thought* all was okay, until a bit later, I did 'zpool status',
> and saw checksum errors on both sides of several of the vdevs.  After much
> digging and poking, the only theory I could come up with was that maybe the
> fencing operation was considered complete too quickly?  I googled for
> examples using this, and the best tutorial I found showed using a
> power-wait=5, whereas the default seems to be power-wait=0?  (this is
> CentOS 7, btw...)  I changed it to use 5 instead of 0, and did a several
> fencing operations while a guest VM (vsphere via NFS) was writing to the
> pool.  So far, no evidence of corruption.  BTW, the way I was creating and
> managing the cluster was with the lcmc java gui.  Possibly the power-wait
> default of 0 comes from there, I can't really tell.  Any thoughts or ideas
> appreciated :)
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160903/5a701080/attachment.htm>


More information about the Users mailing list