[ClusterLabs] fence_apc delay?

Dan Swartzendruber dswartz at druber.com
Sat Sep 3 13:50:10 UTC 2016


On 2016-09-03 08:41, Marek Grac wrote:
> Hi,
> 
> There are two problems mentioned in the email.
> 
> 1) power-wait
> 
> Power-wait is a quite advanced option and there are only few fence
> devices/agent where it makes sense. And only because the HW/firmware
> on the device is somewhat broken. Basically, when we execute power
> ON/OFF operation, we wait for power-wait seconds before we send next
> command. I don't remember any issue with APC and this kind of
> problems.
> 
> 2) the only theory I could come up with was that maybe the fencing
> operation was considered complete too quickly?
> 
> That is virtually not possible. Even when power ON/OFF is
> asynchronous, we test status of device and fence agent wait until
> status of the plug/VM/... matches what user wants.

I think you misunderstood my point (possibly I wasn't clear.)  Not 
saying anything is wrong with either the fencing agent or the PDU, 
rather, my theory is that if the agent flips the power off, then back 
on, if the interval it is off is 'too short', possibly a host like the 
R905 can continue to operate for a couple of seconds, continuing to 
write data to the disks past the point where the other node begins to do 
likewise.  If power_wait is not the right way to wait, say, 10 seconds 
to make 100% sure node A is dead as a doornail, what *is* the right way?




More information about the Users mailing list